There and Back Again: Revisiting Backpropagation Saliency Methods
This work addresses the need for rigorous understanding and improvement of saliency methods for model interpretability in machine learning, though it is incremental as it builds on existing backpropagation techniques.
The authors tackled the lack of clarity in backpropagation-based saliency methods by unifying them under a single framework, leading to the proposal of NormGrad, a novel method based on spatial gradient contributions, and introducing metrics and techniques to improve layer-specific performance and class sensitivity.
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample. A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient. Despite much research on such methods, relatively little work has been done to clarify the differences between such methods as well as the desiderata of these techniques. Thus, there is a need for rigorously understanding the relationships between different methods as well as their failure modes. In this work, we conduct a thorough analysis of backpropagation-based saliency methods and propose a single framework under which several such methods can be unified. As a result of our study, we make three additional contributions. First, we use our framework to propose NormGrad, a novel saliency method based on the spatial contribution of gradients of convolutional weights. Second, we combine saliency maps at different layers to test the ability of saliency methods to extract complementary information at different network levels (e.g.~trading off spatial resolution and distinctiveness) and we explain why some methods fail at specific layers (e.g., Grad-CAM anywhere besides the last convolutional layer). Third, we introduce a class-sensitivity metric and a meta-learning inspired paradigm applicable to any saliency method for improving sensitivity to the output class being explained.