Rethinking Distance Metrics for Counterfactual Explainability
This work addresses the need for improved explainability in ML for users seeking clearer model interpretations, though it appears incremental as it builds on existing counterfactual methods.
The paper tackled the problem of generating counterfactual explanations for machine learning classifiers by proposing a new framing that treats counterfactuals as jointly sampled with references from the data distribution, resulting in a tailored distance metric that expresses more nuanced dependencies among covariates.
Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.