CL CR LGFeb 19, 2024

Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

Myung Gyo Oh, Hong Eun Ahn, Leo Hyun Park, Taekyoung Kwon

arXiv:2402.12189v21.91 citationsh-index: 6Has CodeIEEE Trans Inf Forensics Secur

Originality Incremental advance

AI Analysis

This addresses a security vulnerability in large language models, posing risks for data privacy, but it is incremental as it builds on prior attack scenarios.

The paper tackles the problem of training data extraction attacks on neural language models by introducing an adversarial fine-tuning method that amplifies exposure of the original training data, resulting in a four to eight-fold increase in exposure for models with over 1B parameters.

Neural language models (LMs) are vulnerable to training data extraction attacks due to data memorization. This paper introduces a novel attack scenario wherein an attacker adversarially fine-tunes pre-trained LMs to amplify the exposure of the original training data. This strategy differs from prior studies by aiming to intensify the LM's retention of its pre-training dataset. To achieve this, the attacker needs to collect generated texts that are closely aligned with the pre-training data. However, without knowledge of the actual dataset, quantifying the amount of pre-training data within generated texts is challenging. To address this, we propose the use of pseudo-labels for these generated texts, leveraging membership approximations indicated by machine-generated probabilities from the target LM. We subsequently fine-tune the LM to favor generations with higher likelihoods of originating from the pre-training data, based on their membership probabilities. Our empirical findings indicate a remarkable outcome: LMs with over 1B parameters exhibit a four to eight-fold increase in training data exposure. We discuss potential mitigations and suggest future research directions.

View on arXiv PDF Code

Similar