Explaining Predictions from Tree-based Boosting Ensembles
This work addresses the need for interpretable AI in tree-based ensembles, but it is incremental as it adapts a known method from random forests to GBDTs.
The paper tackles the problem of generating counterfactual explanations for individual predictions from Gradient Boosting Decision Trees (GBDTs) by extending an existing random forest method to account for sequential dependencies and training on negative gradients, resulting in a model-specific approach that avoids surrogate models.
Understanding how "black-box" models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels.