CLFeb 21, 2025

A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

arXiv:2502.15886v110 citationsh-index: 32Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for standardized evaluation of XAI methods in NLP, though it is incremental as it focuses on comparing existing methods rather than introducing new ones.

The paper tackles the lack of comparative analysis between decomposition-based XAI methods like ALTI-Logit and LRP for transformer language models by conducting quantitative evaluations on a subject-verb agreement task with BERT, GPT-2, and LLaMA-3, and provides a publicly available benchmark dataset and code.

Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed. Along the way we compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant, from an algorithmic and implementation perspective. We further incorporate in our benchmark two widely-used gradient-based attribution techniques. Finally, we make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available in order to foster evaluation of XAI-methods on a well-defined common ground.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes