LGCVJun 21, 2021

Leveraging Conditional Generative Models in a General Explanation Framework of Classifier Decisions

arXiv:2106.10947v11 citations
Originality Highly original
AI Analysis

This work addresses the need for trustworthy, human-understandable explanations of AI classifier decisions, which is crucial for real-world applications, though it appears incremental in improving existing explanation methods.

The paper tackles the problem of generating noisy and inaccurate visual explanations for classifier decisions by proposing a new general framework that uses conditional generative models to produce explanations as differences between generated images, demonstrating significant improvements over state-of-the-art methods on three public datasets with localization consistent with human annotations.

Providing a human-understandable explanation of classifiers' decisions has become imperative to generate trust in their use for day-to-day tasks. Although many works have addressed this problem by generating visual explanation maps, they often provide noisy and inaccurate results forcing the use of heuristic regularization unrelated to the classifier in question. In this paper, we propose a new general perspective of the visual explanation problem overcoming these limitations. We show that visual explanation can be produced as the difference between two generated images obtained via two specific conditional generative models. Both generative models are trained using the classifier to explain and a database to enforce the following properties: (i) All images generated by the first generator are classified similarly to the input image, whereas the second generator's outputs are classified oppositely. (ii) Generated images belong to the distribution of real images. (iii) The distances between the input image and the corresponding generated images are minimal so that the difference between the generated elements only reveals relevant information for the studied classifier. Using symmetrical and cyclic constraints, we present two different approximations and implementations of the general formulation. Experimentally, we demonstrate significant improvements w.r.t the state-of-the-art on three different public data sets. In particular, the localization of regions influencing the classifier is consistent with human annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes