Counterfactual Explanations on Robust Perceptual Geodesics
This work addresses the challenge of creating interpretable and robust counterfactual explanations for vision models, which is crucial for improving trust and debugging in AI systems, though it is incremental by refining existing geometric approaches.
The paper tackled the problem of generating meaningful counterfactual explanations in machine learning by addressing the ambiguity in distance metrics that can lead to adversarial or unrealistic perturbations. It introduced Perceptual Counterfactual Geodesics (PCG), which uses a perceptually aligned Riemannian metric to produce smooth, semantically valid transitions, outperforming baselines on three vision datasets.
Latent-space optimization methods for counterfactual explanations - framed as minimal semantic perturbations that change model predictions - inherit the ambiguity of Wachter et al.'s objective: the choice of distance metric dictates whether perturbations are meaningful or adversarial. Existing approaches adopt flat or misaligned geometries, leading to off-manifold artifacts, semantic drift, or adversarial collapse. We introduce Perceptual Counterfactual Geodesics (PCG), a method that constructs counterfactuals by tracing geodesics under a perceptually Riemannian metric induced from robust vision features. This geometry aligns with human perception and penalizes brittle directions, enabling smooth, on-manifold, semantically valid transitions. Experiments on three vision datasets show that PCG outperforms baselines and reveals failure modes hidden under standard metrics.