LGMar 4, 2017

Axiomatic Attribution for Deep Networks

arXiv:1703.01365v27954 citations
Originality Incremental advance
AI Analysis

This work addresses the need for reliable attribution methods in deep learning to improve model interpretability and user engagement, though it is incremental as it builds on prior attribution studies.

The authors tackled the problem of attributing predictions in deep networks to input features by identifying two fundamental axioms—Sensitivity and Implementation Invariance—that most existing methods fail to satisfy, and they introduced Integrated Gradients, a simple method requiring only gradient calls, which they applied to image, text, and chemistry models for debugging and rule extraction.

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

Code Implementations40 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes