CVAILGNov 29, 2023

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

arXiv:2311.17833v312 citationsh-index: 8
Originality Incremental advance
AI Analysis

It addresses the need for explainability and reliability in safety-critical image classification tasks, though it is incremental as it builds on existing guided generation frameworks.

The paper tackles the problem of unreliable and opaque deep learning classifiers by generating images that optimize classifier-derived objectives to analyze decisions, uncovering failure modes like systematic errors in zero-shot CLIP classifiers and outperforming previous work in visual counterfactual explanations.

While deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features, call into question how reliably these classifiers work in the wild. Furthermore, for safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. In this paper, we address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation. We analyze the decisions of image classifiers by visual counterfactual explanations (VCEs), detection of systematic mistakes by analyzing images where classifiers maximally disagree, and visualization of neurons and spurious features. In this way, we validate existing observations, e.g. the shape bias of adversarially robust models, as well as novel failure modes, e.g. systematic errors of zero-shot CLIP classifiers. Moreover, our VCEs outperform previous work while being more versatile.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes