LGOct 25, 2024

Measuring memorization in language models via probabilistic extraction

DeepMind
arXiv:2410.19482v340 citationsh-index: 26NAACL
Originality Incremental advance
AI Analysis

This work addresses the incremental improvement in evaluating privacy risks for users of language models by refining extraction measurement techniques.

The authors tackled the problem of measuring memorization in large language models by showing that the standard discoverable extraction method is unreliable under non-deterministic sampling, and they introduced a probabilistic variant that provides more nuanced risk assessment across models and sampling schemes.

Large language models (LLMs) are susceptible to memorizing training data, raising concerns about the potential extraction of sensitive information at generation time. Discoverable extraction is the most common method for measuring this issue: split a training example into a prefix and suffix, then prompt the LLM with the prefix, and deem the example extractable if the LLM generates the matching suffix using greedy sampling. This definition yields a yes-or-no determination of whether extraction was successful with respect to a single query. Though efficient to compute, we show that this definition is unreliable because it does not account for non-determinism present in more realistic (non-greedy) sampling schemes, for which LLMs produce a range of outputs for the same prompt. We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence. We evaluate our probabilistic measure across different models, sampling schemes, and training-data repetitions, and find that this measure provides more nuanced information about extraction risk compared to traditional discoverable extraction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes