CLMay 18, 2021

Effective Attention Sheds Light On Interpretability

arXiv:2105.08855v1712 citations
AI Analysis

This work addresses interpretability challenges for researchers and practitioners using transformers, offering a more relevant method for analyzing model behavior, though it is incremental as it builds on existing attention mechanisms.

The paper tackled the problem of interpreting transformer self-attention by decomposing it into effective and non-effective components, showing that only effective attention influences model output. The result demonstrated that effective attention differs from standard attention in interpretations, being less tied to pretraining features and more aligned with linguistic features for end-tasks.

An attention matrix of a transformer self-attention sublayer can provably be decomposed into two components and only one of them (effective attention) contributes to the model output. This leads us to ask whether visualizing effective attention gives different conclusions than interpretation of standard attention. Using a subset of the GLUE tasks and BERT, we carry out an analysis to compare the two attention matrices, and show that their interpretations differ. Effective attention is less associated with the features related to the language modeling pretraining such as the separator token, and it has more potential to illustrate linguistic features captured by the model for solving the end-task. Given the found differences, we recommend using effective attention for studying a transformer's behavior since it is more pertinent to the model output by design.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes