CLMay 31, 2021

Attention Flows are Shapley Value Explanations

arXiv:2105.14652v1723 citations
Originality Incremental advance
AI Analysis

This provides a theoretical justification for using attention flows in NLP to improve explanation methods, though it is incremental as it builds on existing Shapley Value theory.

The paper tackles the problem of connecting attention-based explanations in NLP to Shapley Values, proving that attention weights and leave-one-out values are not Shapley Values, but attention flows are, at least at the layerwise level.

Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons. In NLP, however, leave-one-out and attention-based explanations still predominate. Can we draw a connection between these different methods? We formally prove that -- save for the degenerate case -- attention weights and leave-one-out values cannot be Shapley Values. $\textit{Attention flow}$ is a post-processed variant of attention weights obtained by running the max-flow algorithm on the attention graph. Perhaps surprisingly, we prove that attention flows are indeed Shapley Values, at least at the layerwise level. Given the many desirable theoretical qualities of Shapley Values -- which has driven their adoption among the ML community -- we argue that NLP practitioners should, when possible, adopt attention flow explanations alongside more traditional ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes