CLLGJun 18, 2025

TokenShapley: Token Level Context Attribution with Shapley Value

arXiv:2507.05261v213 citationsh-index: 5ACL
Originality Incremental advance
AI Analysis

This addresses the need for fine-grained attribution in LLMs, particularly for specific keywords like numbers or names, representing an incremental advance over sentence-level methods.

The paper tackles the problem of verifying correctness in large language model responses by proposing TokenShapley, a token-level attribution method that combines Shapley values with KNN retrieval, achieving an 11-23% improvement in accuracy on benchmarks.

Large language models (LLMs) demonstrate strong capabilities in in-context learning, but verifying the correctness of their generated responses remains a challenge. Prior work has explored attribution at the sentence level, but these methods fall short when users seek attribution for specific keywords within the response, such as numbers, years, or names. To address this limitation, we propose TokenShapley, a novel token-level attribution method that combines Shapley value-based data attribution with KNN-based retrieval techniques inspired by recent advances in KNN-augmented LLMs. By leveraging a precomputed datastore for contextual retrieval and computing Shapley values to quantify token importance, TokenShapley provides a fine-grained data attribution approach. Extensive evaluations on four benchmarks show that TokenShapley outperforms state-of-the-art baselines in token-level attribution, achieving an 11-23% improvement in accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes