CL AIApr 1, 2024

Source-Aware Training Enables Knowledge Attribution in Language Models

Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng

AI2

arXiv:2404.01019v314.633 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of LLM transparency and verifiability for users, though it is incremental as it builds on existing frameworks with minimal architectural changes.

The paper tackles the problem of enabling large language models to cite their pretraining sources for generated responses, and demonstrates through experiments on synthetic data that their source-aware training recipe can achieve faithful attribution without substantially increasing perplexity compared to standard pretraining.

Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning stage to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training borrows from existing pretraining/fine-tuning frameworks and requires minimal changes to the model architecture or implementation. Through experiments on synthetic data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's perplexity compared to standard pretraining. Our findings also highlight the importance of pretraining data augmentation in achieving attribution. Code and data available here: \url{https://github.com/mukhal/intrinsic-source-citation}

View on arXiv PDF Code

Similar