Extraction of Training Data from Fine-Tuned Large Language Models
Started:
This is my 5th Year Master of Science thesis in the Computer Science Department at Carnegie Mellon University, presented on April 29, 2024.
Large Language Models have been shown to perform well on natural language tasks, even those they were not explicitly trained to perform. Fine-tuning these models on smaller datasets has become a popular technique to achieve high performance on specific tasks. However, fine-tuning can lead to the memorization of training data, which may be a privacy concern.
In this work, I investigated the extraction of training data from fine-tuned large language models. I conducted a series of experiments to determine how easily private training data can be extracted from fine-tuned models using different data extraction techniques. I also investigated how the following factors affect the ease of data extraction:
- Amount of training data used for fine-tuning
- Number of training epochs
- Length and content of each training sample
- Fine-tuning technique and parameters used
Key Findings
- Data extraction is straightforward with direct model access if the model is not trained only on prompt completions.
- The proportion of data that can be extracted increased with the amount of data used for fine-tuning (for a constant number of epochs).
This work has implications for the privacy of individuals whose data is used for fine-tuning, as well as for businesses or groups that use fine-tuned models in public-facing software.
Thesis Committee
- Matt Fredrikson (Chair)
- Yuvraj Agarwal
Presentation details: April 29, 2024, 1:00–2:00pm, Gates Hillman 9115, Carnegie Mellon University
