LGCRSep 29, 2024

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

arXiv:2409.19798v251 citationsh-index: 14
AI Analysis

This addresses a critical legal and ethical problem for data creators and owners in lawsuits against foundation models, highlighting an incremental but important limitation in existing methods.

The paper argues that membership inference attacks are fundamentally unsound for proving a model was trained on specific data, because it's impossible to sample from the null hypothesis of not being trained on that data due to unknown training sets and computational constraints. It suggests alternative approaches like data extraction attacks or membership inference on canary data as viable solutions.

We consider the problem of a training data proof, where a data creator or owner wants to demonstrate to a third party that some machine learning model was trained on their data. Training data proofs play a key role in recent lawsuits against foundation models trained on web-scale data. Many prior works suggest to instantiate training data proofs using membership inference attacks. We argue that this approach is fundamentally unsound: to provide convincing evidence, the data creator needs to demonstrate that their attack has a low false positive rate, i.e., that the attack's output is unlikely under the null hypothesis that the model was not trained on the target data. Yet, sampling from this null hypothesis is impossible, as we do not know the exact contents of the training set, nor can we (efficiently) retrain a large foundation model. We conclude by offering two paths forward, by showing that data extraction attacks and membership inference on special canary data can be used to create sound training data proofs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes