LGApr 23, 2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan, Edan Hauon, Avi Caciularu, Ori Katz, Itzik Malkiel, Omri Armstrong, Noam Koenigstein

arXiv:2204.11073v122.967 citationsh-index: 28

Originality Incremental advance

AI Analysis

This addresses the need for interpretability in NLP models, which is crucial for researchers and practitioners to understand and trust AI decisions, though it appears incremental as it builds on existing gradient-based explanation methods.

The paper tackled the problem of explaining predictions in transformer-based language models by introducing Gradient Self-Attention Maps (Grad-SAM), a gradient-based method that identifies key input elements, resulting in significant improvements over state-of-the-art alternatives on various benchmarks.

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

View on arXiv PDF

Similar