LGJun 14, 2024

Semantic Membership Inference Attack against Large Language Models

arXiv:2406.10218v118 citations
Originality Incremental advance
AI Analysis

This work addresses privacy risks for users of large language models by enhancing attack capabilities, representing an incremental improvement over existing methods.

The paper tackles the problem of membership inference attacks on large language models by introducing SMIA, which leverages semantic content and perturbations to improve attack performance, achieving an AUC-ROC of 67.39% on Pythia-12B compared to 58.90% for the second-best attack.

Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia dataset. Our results show that SMIA significantly outperforms existing MIAs; for instance, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes