Extraction of Training Data from Fine-Tuned Large Language Models

Started: April 29, 2024

This is my 5th Year Master of Science thesis in the Computer Science Department at Carnegie Mellon University, presented on April 29, 2024.

Large Language Models have been shown to perform well on natural language tasks, even those they were not explicitly trained to perform. Fine-tuning these models on smaller datasets has become a popular technique to achieve high performance on specific tasks. However, fine-tuning can lead to the memorization of training data, which may be a privacy concern.

In this work, I investigated the extraction of training data from fine-tuned large language models. I conducted a series of experiments to determine how easily private training data can be extracted from fine-tuned models using different data extraction techniques. I also investigated how the following factors affect the ease of data extraction:

Amount of training data used for fine-tuning
Number of training epochs
Length and content of each training sample
Fine-tuning technique and parameters used

Key Findings

Data extraction is straightforward with direct model access if the model is not trained only on prompt completions.
The proportion of data that can be extracted increased with the amount of data used for fine-tuning (for a constant number of epochs).

This work has implications for the privacy of individuals whose data is used for fine-tuning, as well as for businesses or groups that use fine-tuned models in public-facing software.

Thesis Committee

Matt Fredrikson (Chair)
Yuvraj Agarwal

Presentation details: April 29, 2024, 1:00–2:00pm, Gates Hillman 9115, Carnegie Mellon University

CMU CSD Thesis Announcement

Download Thesis (PDF)

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Mihir Dhamankar

Share on