CR LGApr 1, 2024

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

arXiv:2404.01231v122.736 citationsh-index: 18Has CodeNIPS

Originality Incremental advance

AI Analysis

This exposes a critical privacy risk for users of open-source pre-trained models, particularly in fine-tuning scenarios, and is incremental in revealing a new vulnerability within existing attack frameworks.

The paper introduces a privacy backdoor attack that significantly increases membership inference leakage when fine-tuning pre-trained models, achieving up to 30% higher attack success rates compared to standard models in experiments.

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

View on arXiv PDF

Similar