Multiple Different Black Box Explanations for Image Classifiers
This work addresses the problem of gaining deeper insights into classifier behavior for researchers and practitioners, though it is incremental as it builds on existing explanation methods.
The paper tackles the limitation of single explanations for image classifiers by introducing MultEX, an algorithm that computes multiple explanations using a principled causality-based approach, and demonstrates that it finds more and higher-quality explanations across three models and datasets.
Existing explanation tools for image classifiers usually give only a single explanation for an image's classification. For many images, however, image classifiers accept more than one explanation for the image label. These explanations are useful for analyzing the decision process of the classifier and for detecting errors. Thus, restricting the number of explanations to just one severely limits insight into the behavior of the classifier. In this paper, we describe an algorithm and a tool, MultEX, for computing multiple explanations as the output of a black-box image classifier for a given image. Our algorithm uses a principled approach based on actual causality. We analyze its theoretical complexity and evaluate MultEX against the state-of-the-art across three different models and three different datasets. We find that MultEX finds more explanations and that these explanations are of higher quality.