LGJan 26, 2023

Finding Regions of Counterfactual Explanations via Robust Optimization

arXiv:2301.11113v331 citationsh-index: 39
Originality Incremental advance
AI Analysis

This work addresses the problem of improving explainability and bias detection in classification models for users by offering multiple robust counterfactual options, though it is incremental as it builds on existing methods with a focus on robustness.

The paper tackles the limitation of existing counterfactual explanation methods that provide only one explanation, which may not be feasible for users, by developing an iterative method to compute robust counterfactual explanations that remain valid under feature perturbations, resulting in a region of explanations that allows user choice and is proven efficient for common models like logistic regression and neural networks.

Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes