On Spectral Properties of Gradient-based Explanation Methods
This work addresses the problem of unreliable explanations in deep learning for researchers and practitioners, offering incremental improvements through formal analysis and practical solutions.
The paper tackles the reliability issues in gradient-based explanation methods for deep networks by analyzing them through probabilistic and spectral perspectives, revealing a pervasive spectral bias and providing remedies like a standard perturbation scale and SpectralLens aggregation, with theoretical results supported by quantitative evaluations.
Understanding the behavior of deep networks is crucial to increase our confidence in their results. Despite an extensive body of work for explaining their predictions, researchers have faced reliability issues, which can be attributed to insufficient formalism. In our research, we adopt novel probabilistic and spectral perspectives to formally analyze explanation methods. Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices that have been discovered experimentally, in particular, the use of squared gradient and input perturbation. We further characterize how the choice of perturbation hyperparameters in explanation methods, such as SmoothGrad, can lead to inconsistent explanations and introduce two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens. Finally, we substantiate our theoretical results through quantitative evaluations.