Optimal Counterfactual Explanations in Tree Ensembles
This addresses the need for reliable and efficient counterfactual explanations in machine learning, particularly for tree-based models, with incremental improvements in optimization.
The paper tackles the problem of generating trustworthy counterfactual explanations for tree ensembles by proposing a model-based search using efficient mixed-integer programming, achieving computational effort orders of magnitude smaller than previous methods and providing optimal explanations within seconds for large datasets.
Counterfactual explanations are usually generated through heuristics that are sensitive to the search's initial conditions. The absence of guarantees of performance and robustness hinders trustworthiness. In this paper, we take a disciplined approach towards counterfactual explanations for tree ensembles. We advocate for a model-based search aiming at "optimal" explanations and propose efficient mixed-integer programming approaches. We show that isolation forests can be modeled within our framework to focus the search on plausible explanations with a low outlier score. We provide comprehensive coverage of additional constraints that model important objectives, heterogeneous data types, structural constraints on the feature space, along with resource and actionability restrictions. Our experimental analyses demonstrate that the proposed search approach requires a computational effort that is orders of magnitude smaller than previous mathematical programming algorithms. It scales up to large data sets and tree ensembles, where it provides, within seconds, systematic explanations grounded on well-defined models solved to optimality.