Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space
This work addresses the need for actionable and realistic explanations in algorithmic recourse, which is crucial for trustworthy AI adoption, though it is incremental as it builds on existing counterfactual explanation methods.
The paper tackles the problem of generating realistic and computationally efficient counterfactual explanations for binary classifiers by shaping an autoencoder's latent space into a Gaussian mixture distribution and using linear interpolation. The result is a method that efficiently produces explanations closer to the original data manifold, outperforming three state-of-the-art methods on image and tabular datasets.
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions: 1. What are the crucial factors that led to an automated prediction/decision? 2. How can these factors be changed to achieve a more favorable outcome from a user's perspective? Thus, guiding the user's interaction with AI systems by proposing easy-to-understand explanations and easy-to-attain feasible changes is essential for the trustworthy adoption and long-term acceptance of AI systems. In the literature, various methods have been proposed to generate CEs, and different quality measures have been suggested to evaluate these methods. However, the generation of CEs is usually computationally expensive, and the resulting suggestions are unrealistic and thus non-actionable. In this paper, we introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions. CEs are then generated in latent space by linear interpolation between the query sample and the centroid of the target class. We show that our method maintains the characteristics of the input sample during the counterfactual search. In various experiments, we show that the proposed method is competitive based on different quality measures on image and tabular datasets -- efficiently returns results that are closer to the original data manifold compared to three state-of-the-art methods, which are essential for realistic high-dimensional machine learning applications.