CLAILGMay 26

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information

arXiv:2601.0308944.7h-index: 3
AI Analysis

For researchers evaluating attribution methods in LLMs, this work provides a more rigorous evaluation framework and a new attribution method, though the gains are incremental.

The authors propose π-Soft-NC and π-Soft-NS, faithfulness metrics that control for the number of retained words during perturbation, addressing a confound in existing soft-perturbation metrics. They also introduce Grad-ELLM, a gradient-based attribution method for decoder-only LLMs, which shows strong comprehensiveness-oriented faithfulness under π-Soft-NC on classification and open-generation tasks with Llama and Mistral.

Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can conflate attribution quality with the number of words retained during perturbation: attribution methods with larger average scores may keep more words and therefore obtain inflated scores. To address this issue, we propose $π$-Soft-NC and $π$-Soft-NS, an evaluation framework that compares attribution methods under the same expected retaining probability, thus controlling the number of retained words. We further introduce Grad-ELLM, a gradient-based attribution method tailored to autoregressive decoder-only LLMs, which combines gradient-derived channel importance with attention-derived token importance at each decoding step. Experiments on classification and open-generation tasks with Llama and Mistral show that Grad-ELLM achieves strong comprehensiveness-oriented faithfulness under $π$-Soft-NC, while there is no dominant method under $π$-Soft-NS. Our evaluation metric serves as a rigorous framework to compare XAI methods for LLMs, which will support progress in the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes