CVLGDec 19, 2019

Explaining Classifiers using Adversarial Perturbations on the Perceptual Ball

arXiv:1912.09405v412 citations
Originality Incremental advance
AI Analysis

This provides a bridge between counterfactual explanations and adversarial perturbations for image-based classifiers, offering a semantically meaningful explanation method.

The paper tackles the problem of explaining image classifiers by introducing a regularization method for adversarial perturbations based on perceptual loss, resulting in semi-sparse alterations that highlight objects while leaving backgrounds unchanged. It demonstrates effectiveness on standard benchmarks like weak localization, insertion deletion, and the pointing game.

We present a simple regularization of adversarial perturbations based upon the perceptual loss. While the resulting perturbations remain imperceptible to the human eye, they differ from existing adversarial perturbations in that they are semi-sparse alterations that highlight objects and regions of interest while leaving the background unaltered. As a semantically meaningful adverse perturbations, it forms a bridge between counterfactual explanations and adversarial perturbations in the space of images. We evaluate our approach on several standard explainability benchmarks, namely, weak localization, insertion deletion, and the pointing game demonstrating that perceptually regularized counterfactuals are an effective explanation for image-based classifiers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes