CRMay 7

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

Zeyuan Chen, Yihan Ma, Xinyue Shen, Michael Backes, Yang Zhang

arXiv:2605.0642368.0

Predicted impact top 22% in CR · last 90 daysOriginality Incremental advance

AI Analysis

This work exposes persistent privacy vulnerabilities in modern LLMs, enabling adversaries to infer training data membership with high accuracy.

The PopQuiz Attack achieves a black-box membership inference attack against LLMs by converting target data into multiple-choice questions, attaining an average ROC-AUC of 0.873 and outperforming existing methods by 20.6% across six models and four datasets.

Large language models (LLMs) show strong performance across many applications, but their ability to memorize and potentially reveal training data raises serious privacy concerns. We introduce the PopQuiz Attack, a black-box membership inference attack that tests whether a model can recall specific training examples. The core idea is to turn target data into quiz-style multiple-choice questions and infer membership from the model's answers. Across six widely used LLMs (GPT-3.5, GPT-4o, LLaMA2-7b, LLaMA2-13b, Mistral-7b, and Vicuna-7b) and four datasets, our method achieves an average ROC-AUC of 0.873 and outperforms existing approaches by 20.6%. We further analyze factors affecting attack success, including query complexity, data type, data structure, and training settings. We also evaluate instruction-based, filter-based, and differential privacy-based defenses, which reduce performance but do not eliminate the risk. Our results highlight persistent privacy vulnerabilities in modern LLMs.

View on arXiv PDF

Similar