A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations
This provides a theoretical explanation for visualization behaviors in deep learning, which is incremental but clarifies a known issue for researchers in interpretability.
The paper tackled the lack of theoretical justification for why guided backpropagation and deconvolutional networks produce human-interpretable but class-insensitive visualizations in CNNs, revealing through analysis that these methods perform image recovery unrelated to network decisions.
Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors: Guided backpropagation (GBP) and deconvolutional network (DeconvNet) generate more human-interpretable but less class-sensitive visualizations than saliency map. Motivated by this, we develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery which is unrelated to the network decisions. Specifically, our analysis shows that the backward ReLU introduced by GBP and DeconvNet, and the local connections in CNNs are the two main causes of compelling visualizations. Extensive experiments are provided that support the theoretical analysis.