Forward Learning for Gradient-based Black-box Saliency Map Generation
This work addresses the need for interpretability in deep, black-box models such as ChatGPT, though it is incremental as it extends gradient-based methods to new settings.
The paper tackles the problem of generating gradient-based saliency maps for black-box models like closed-source APIs, where computing gradients is challenging, by introducing a unified framework using likelihood ratio methods and blockwise computation, achieving accurate gradient estimation and explainability as validated in experiments.
Gradient-based saliency maps are widely used to explain deep neural network decisions. However, as models become deeper and more black-box, such as in closed-source APIs like ChatGPT, computing gradients become challenging, hindering conventional explanation methods. In this work, we introduce a novel unified framework for estimating gradients in black-box settings and generating saliency maps to interpret model decisions. We employ the likelihood ratio method to estimate output-to-input gradients and utilize them for saliency map generation. Additionally, we propose blockwise computation techniques to enhance estimation accuracy. Extensive experiments in black-box settings validate the effectiveness of our method, demonstrating accurate gradient estimation and explainability of generated saliency maps. Furthermore, we showcase the scalability of our approach by applying it to explain GPT-Vision, revealing the continued relevance of gradient-based explanation methods in the era of large, closed-source, and black-box models.