A Framework for Feasible Counterfactual Exploration incorporating Causality, Sparsity and Density
This work addresses the need for interpretable and practical counterfactual explanations in real-world applications, though it appears incremental by building on existing methods.
The authors tackled the problem of generating feasible counterfactual explanations for machine learning models by preserving causal relations and ensuring sparsity, using a black-box classifier and Variational Autoencoder on three benchmark datasets to produce feasible and sparse examples that satisfy causal constraints.
The imminent need to interpret the output of a Machine Learning model with counterfactual (CF) explanations - via small perturbations to the input - has been notable in the research community. Although the variety of CF examples is important, the aspect of them being feasible at the same time, does not necessarily apply in their entirety. This work uses different benchmark datasets to examine through the preservation of the logical causal relations of their attributes, whether CF examples can be generated after a small amount of changes to the original input, be feasible and actually useful to the end-user in a real-world case. To achieve this, we used a black box model as a classifier, to distinguish the desired from the input class and a Variational Autoencoder (VAE) to generate feasible CF examples. As an extension, we also extracted two-dimensional manifolds (one for each dataset) that located the majority of the feasible examples, a representation that adequately distinguished them from infeasible ones. For our experimentation we used three commonly used datasets and we managed to generate feasible and at the same time sparse, CF examples that satisfy all possible predefined causal constraints, by confirming their importance with the attributes in a dataset.