Two-Stage Holistic and Contrastive Explanation of Image Classification
This addresses the need for more comprehensive model explanations to help users understand overall model behavior and discriminate between competing classes, though it is incremental as it builds on existing explanation methods.
The paper tackles the problem of explaining deep neural network classifiers by proposing a contrastive whole-output explanation (CWOX) method that explains the entire probability distribution over multiple classes, rather than just a single class, and evaluates it using quantitative metrics and human studies.
The need to explain the output of a deep neural network classifier is now widely recognized. While previous methods typically explain a single class in the output, we advocate explaining the whole output, which is a probability distribution over multiple classes. A whole-output explanation can help a human user gain an overall understanding of model behaviour instead of only one aspect of it. It can also provide a natural framework where one can examine the evidence used to discriminate between competing classes, and thereby obtain contrastive explanations. In this paper, we propose a contrastive whole-output explanation (CWOX) method for image classification, and evaluate it using quantitative metrics and through human subject studies. The source code of CWOX is available at https://github.com/vaynexie/CWOX.