Information-Theoretic Visual Explanation for Black-Box Classifiers
This work addresses the need for interpretable AI by providing more accurate visual explanations for black-box classifiers, which is incremental as it builds on existing attribution methods.
The paper tackles the problem of explaining predictions from black-box classifiers by proposing an information-theoretic method that uses information gain and point-wise mutual information to generate attribution maps, resulting in improved correctness as measured by a quantitative metric.
In this work, we attempt to explain the prediction of any black-box classifier from an information-theoretic perspective. For each input feature, we compare the classifier outputs with and without that feature using two information-theoretic metrics. Accordingly, we obtain two attribution maps--an information gain (IG) map and a point-wise mutual information (PMI) map. IG map provides a class-independent answer to "How informative is each pixel?", and PMI map offers a class-specific explanation of "How much does each pixel support a specific class?" Compared to existing methods, our method improves the correctness of the attribution maps in terms of a quantitative metric. We also provide a detailed analysis of an ImageNet classifier using the proposed method, and the code is available online.