Fast Training Dataset Attribution via In-Context Learning
This work addresses the challenge of dataset attribution for researchers and practitioners using LLMs, offering an incremental improvement in robustness over existing methods.
The paper tackles the problem of estimating training data contributions in instruction-tuned large language models by proposing two novel approaches: a similarity-based method and a mixture distribution model. The result shows that the mixture model approach is more robust to retrieval noise, providing more reliable estimations.
We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.