CRAINov 29, 2024

LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states

arXiv:2411.19876v310 citationsh-index: 17ESORICS
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users of LLMs by improving detection of membership inference attacks, though it is incremental as it builds on existing linear probing techniques.

The paper tackled the problem of membership inference attacks on large language models by proposing LUMIA, a method using linear probes on internal activations, which achieved an average gain of 15.71% in AUC over previous techniques and reached AUC>60% in 65.33% of unimodal cases and 85.90% of multimodal experiments.

Large Language Models (LLMs) are increasingly used in a variety of applications, but concerns around membership inference have grown in parallel. Previous efforts focus on black-to-grey-box models, thus neglecting the potential benefit from internal LLM information. To address this, we propose the use of Linear Probes (LPs) as a method to detect Membership Inference Attacks (MIAs) by examining internal activations of LLMs. Our approach, dubbed LUMIA, applies LPs layer-by-layer to get fine-grained data on the model inner workings. We test this method across several model architectures, sizes and datasets, including unimodal and multimodal tasks. In unimodal MIA, LUMIA achieves an average gain of 15.71 % in Area Under the Curve (AUC) over previous techniques. Remarkably, LUMIA reaches AUC>60% in 65.33% of cases -- an increment of 46.80% against the state of the art. Furthermore, our approach reveals key insights, such as the model layers where MIAs are most detectable. In multimodal models, LPs indicate that visual inputs can significantly contribute to detect MIAs -- AUC>60% is reached in 85.90% of experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes