Ranking Counterfactual Explanations
This work addresses the problem of making AI outcomes more understandable for end-users by improving counterfactual explanations, though it is incremental as it builds on existing explanation formalizations.
The paper tackles the lack of a formal study of counterfactual explanations in AI by proposing a definition and a method to rank multiple counterfactuals, finding that a single optimal explanation emerges in most cases across 12 datasets, with metrics showing higher representativeness and broader explanatory range compared to random minimal counterfactuals.
AI-driven outcomes can be challenging for end-users to understand. Explanations can address two key questions: "Why this outcome?" (factual) and "Why not another?" (counterfactual). While substantial efforts have been made to formalize factual explanations, a precise and comprehensive study of counterfactual explanations is still lacking. This paper proposes a formal definition of counterfactual explanations, proving some properties they satisfy, and examining the relationship with factual explanations. Given that multiple counterfactual explanations generally exist for a specific case, we also introduce a rigorous method to rank these counterfactual explanations, going beyond a simple minimality condition, and to identify the optimal ones. Our experiments with 12 real-world datasets highlight that, in most cases, a single optimal counterfactual explanation emerges. We also demonstrate, via three metrics, that the selected optimal explanation exhibits higher representativeness and can explain a broader range of elements than a random minimal counterfactual. This result highlights the effectiveness of our approach in identifying more robust and comprehensive counterfactual explanations.