Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles
This work addresses the need for valid algorithmic recourse and meaningful explanations in machine learning, particularly for randomized ensembles, though it is incremental as it builds on existing counterfactual explanation methods.
The paper tackles the problem of generating robust counterfactual explanations for randomized ensembles, where existing methods often fail with validity below 50% or as low as 20% in high-dimensional cases. The authors develop a method that achieves high robustness with only a small increase in distance from initial observations, supported by theoretical guarantees for convex base learners.
Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Obtaining robust counterfactual explanations is essential to provide valid algorithmic recourse and meaningful explanations. We study the robustness of explanations of randomized ensembles, which are always subject to algorithmic uncertainty even when the training data is fixed. We formalize the generation of robust counterfactual explanations as a probabilistic problem and show the link between the robustness of ensemble models and the robustness of base learners. We develop a practical method with good empirical performance and support it with theoretical guarantees for ensembles of convex base learners. Our results show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below $50\%$ on most data sets and can fall to $20\%$ on problems with many features. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.