CLJan 18, 2024

Better Explain Transformers by Illuminating Important Information

arXiv:2401.09972v3107 citationsHas CodeFindings
Originality Incremental advance
AI Analysis

This work provides a more accurate explanation method for Transformer-based NLP models, which is incremental as it builds on layer-wise relevance propagation.

The paper tackled the problem of explaining Transformer models by addressing the distortion caused by irrelevant information in existing methods, resulting in a new approach that improved explanation metrics by 3% to 33% over baselines.

Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes