AILGSep 11, 2020

The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples

arXiv:2009.05487v375 citations
AI Analysis

This work clarifies foundational concepts in AI interpretability and security, which is incremental but important for researchers and practitioners in these domains.

The paper tackles the relationship between counterfactual explanations and adversarial examples, arguing that they are formally distinct based on properties like relation to true label and proximity tolerance, and predicts that the fields will merge as common use-cases increase.

The same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes