Counterfactual Explanations via Riemannian Latent Space Traversal
This work addresses the need for more natural and actionable explanations for practitioners using complex deep models, representing an incremental improvement over existing latent space traversal methods.
The paper tackles the problem of generating unnatural counterfactual explanations in deep models by introducing a method that uses a Riemannian metric pulled back via the decoder and classifier, resulting in robust trajectories with high fidelity on real-world tabular datasets.
The adoption of increasingly complex deep models has fueled an urgent need for insight into how these models make predictions. Counterfactual explanations form a powerful tool for providing actionable explanations to practitioners. Previously, counterfactual explanation methods have been designed by traversing the latent space of generative models. Yet, these latent spaces are usually greatly simplified, with most of the data distribution complexity contained in the decoder rather than the latent embedding. Thus, traversing the latent space naively without taking the nonlinear decoder into account can lead to unnatural counterfactual trajectories. We introduce counterfactual explanations obtained using a Riemannian metric pulled back via the decoder and the classifier under scrutiny. This metric encodes information about the complex geometric structure of the data and the learned representation, enabling us to obtain robust counterfactual trajectories with high fidelity, as demonstrated by our experiments in real-world tabular datasets.