Robust Counterfactual Explanations in Machine Learning: A Survey
It tackles the risk of invalid explanations for individuals affected by ML predictions, but is incremental as it synthesizes existing work.
This survey addresses the problem of robustness in counterfactual explanations for machine learning, highlighting severe issues in current methods and reviewing existing solutions and their limitations.
Counterfactual explanations (CEs) are advocated as being ideally suited to providing algorithmic recourse for subjects affected by the predictions of machine learning models. While CEs can be beneficial to affected individuals, recent work has exposed severe issues related to the robustness of state-of-the-art methods for obtaining CEs. Since a lack of robustness may compromise the validity of CEs, techniques to mitigate this risk are in order. In this survey, we review works in the rapidly growing area of robust CEs and perform an in-depth analysis of the forms of robustness they consider. We also discuss existing solutions and their limitations, providing a solid foundation for future developments.