CL AINov 11, 2024

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

arXiv:2411.07070v21.93 citationsh-index: 1Has Code

Originality Incremental advance

AI Analysis

This addresses privacy issues for the language model fine-tuning community by providing a tool to audit and mitigate risks, though it is incremental as it builds on existing membership inference attack methods.

The paper tackles privacy risks in supervised fine-tuning of language models by introducing an active auditing framework called Parsing, which uses improved white-box membership inference attacks to identify and quantify leakage, showing efficiency across models like GPT-2 and Llama2 with notable privacy concerns.

The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://anonymous.4open.science/r/PARSING-4817/.

View on arXiv PDF

Similar