LGMLDec 20, 2019

When Explanations Lie: Why Many Modified BP Attributions Fail

arXiv:1912.09818v7149 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of unreliable explanations in neural networks for researchers and practitioners, revealing fundamental flaws in popular attribution methods.

The paper analyzed multiple modified backpropagation attribution methods and found that most, except DeepLIFT, produce explanations independent of later network layers, indicating they may be unfaithful; it introduced a new metric, cosine similarity convergence, to measure this issue.

Attribution methods aim to explain a neural network's prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically. For code see: https://github.com/berleon/when-explanations-lie

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes