LGCLSep 1, 2024

ContextCite: Attributing Model Generation to Context

MIT
arXiv:2409.00729v277 citationsh-index: 11Has Code
AI Analysis

This addresses the need for interpretability and reliability in AI-generated content, particularly for users relying on context-aware models, though it is an incremental improvement in attribution methods.

The paper tackles the problem of context attribution in language models, introducing ContextCite to pinpoint which parts of the context influence generated statements, and demonstrates its utility in verifying statements, improving response quality, and detecting poisoning attacks.

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes