Generating Counterfactual Explanations Using Cardinality Constraints
This addresses the issue of overly complex counterfactuals for users needing transparent AI decisions, though it is incremental as it builds on existing counterfactual methods.
The paper tackles the problem of generating interpretable counterfactual explanations for machine learning predictions by introducing a cardinality constraint to limit the number of features that can differ from the original example, resulting in more easily understandable counterfactuals.
Providing explanations about how machine learning algorithms work and/or make particular predictions is one of the main tools that can be used to improve their trusworthiness, fairness and robustness. Among the most intuitive type of explanations are counterfactuals, which are examples that differ from a given point only in the prediction target and some set of features, presenting which features need to be changed in the original example to flip the prediction for that example. However, such counterfactuals can have many different features than the original example, making their interpretation difficult. In this paper, we propose to explicitly add a cardinality constraint to counterfactual generation limiting how many features can be different from the original example, thus providing more interpretable and easily understantable counterfactuals.