Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
For developers and researchers needing to detect unauthorized use of proprietary datasets in LLMs, especially when internal model access is restricted.
The paper addresses the problem of detecting whether a dataset was used in training an LLM without requiring log probabilities. They propose CatShift, which uses catastrophic forgetting to measure output shifts, achieving effective inference on both open-source and API-based models.
Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling practical protection of proprietary datasets.