Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

arXiv:2601.19936v12.71 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses privacy and copyright concerns for LLM users and developers, representing a strong specific gain in pretraining data detection.

The paper tackles the problem of detecting whether text was part of a large language model's pretraining data, addressing privacy and copyright concerns, by proposing Gap-K%, a method that measures the log probability gap between the model's top-1 predicted token and the target token with a sliding window; it achieves state-of-the-art performance on WikiMIA and MIMIR benchmarks.

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model's top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.

View on arXiv PDF

Similar