Explaining Local, Global, And Higher-Order Interactions In Deep Learning
This addresses the need for better interpretability in deep learning, particularly for understanding complex interactions in models, though it is incremental as it builds on existing gradient-based methods like Grad-CAM.
The paper tackles the problem of explaining interactions in neural networks by introducing a method based on cross derivatives for detecting 2-way and higher-order interactions, showing it outperforms weight-based techniques and extending it to computer vision with Taylor-CAM for relational reasoning across objects, supported by qualitative, quantitative, and user study results.
We present a simple yet highly generalizable method for explaining interacting parts within a neural network's reasoning process. First, we design an algorithm based on cross derivatives for computing statistical interaction effects between individual features, which is generalized to both 2-way and higher-order (3-way or more) interactions. We present results side by side with a weight-based attribution technique, corroborating that cross derivatives are a superior metric for both 2-way and higher-order interaction detection. Moreover, we extend the use of cross derivatives as an explanatory device in neural networks to the computer vision setting by expanding Grad-CAM, a popular gradient-based explanatory tool for CNNs, to the higher order. While Grad-CAM can only explain the importance of individual objects in images, our method, which we call Taylor-CAM, can explain a neural network's relational reasoning across multiple objects. We show the success of our explanations both qualitatively and quantitatively, including with a user study. We will release all code as a tool package to facilitate explainable deep learning.