Staying True to Your Word: (How) Can Attention Become Explanation?
This work tackles the transparency issue in NLP models for researchers and practitioners, offering an incremental improvement to make attention more reliable as an explanation method.
The paper addresses the problem of whether attention mechanisms in NLP can reliably serve as explanations for model decisions, focusing on recurrent networks in sequence classification tasks. It proposes a word-level objective that improves attention's faithfulness as an interpretation tool, providing credibility for its use in recurrent models.
The attention mechanism has quickly become ubiquitous in NLP. In addition to improving performance of models, attention has been widely used as a glimpse into the inner workings of NLP models. The latter aspect has in the recent years become a common topic of discussion, most notably in work of Jain and Wallace, 2019; Wiegreffe and Pinter, 2019. With the shortcomings of using attention weights as a tool of transparency revealed, the attention mechanism has been stuck in a limbo without concrete proof when and whether it can be used as an explanation. In this paper, we provide an explanation as to why attention has seen rightful critique when used with recurrent networks in sequence classification tasks. We propose a remedy to these issues in the form of a word level objective and our findings give credibility for attention to provide faithful interpretations of recurrent models.