MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation
This addresses a bottleneck in Explainable AI for real-world applications with complex data, though it is an incremental improvement over existing optimization-based methods.
The paper tackles the problem of generating counterfactual explanations for machine learning predictions when models are non-differentiable or have many categorical values, proposing the MACE framework which improves validity, sparsity, and proximity in experiments on public datasets.
Counterfactual explanation is an important Explainable AI technique to explain machine learning predictions. Despite being studied actively, existing optimization-based methods often assume that the underlying machine-learning model is differentiable and treat categorical attributes as continuous ones, which restricts their real-world applications when categorical attributes have many different values or the model is non-differentiable. To make counterfactual explanation suitable for real-world applications, we propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE), which adopts a newly designed pipeline that can efficiently handle non-differentiable machine-learning models on a large number of feature values. in our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.