LG CLSep 1, 2024

ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry

MIT

arXiv:2409.00729v233.383 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the need for interpretability and reliability in AI-generated content, particularly for users relying on context-aware models, though it is an incremental improvement in attribution methods.

The paper tackles the problem of context attribution in language models, introducing ContextCite to pinpoint which parts of the context influence generated statements, and demonstrates its utility in verifying statements, improving response quality, and detecting poisoning attacks.

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

View on arXiv PDF Code

Similar