LGMLNov 16, 2017

Towards better understanding of gradient-based attribution methods for Deep Neural Networks

arXiv:1711.06104v4362 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better theoretical and empirical comparisons of explanation methods in AI, though it is incremental as it builds on existing attribution techniques.

The paper tackles the problem of understanding and comparing gradient-based attribution methods for Deep Neural Networks by formally proving equivalence conditions and proposing a unified framework, and it introduces a new evaluation metric, Sensitivity-n, tested on image and text classification datasets with various architectures.

Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work, we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes