Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles
This addresses the issue of model complexity and efficiency for users of tree ensembles in tabular data, though it is incremental as it builds on existing pruning concepts.
The paper tackles the problem of large tree ensembles lacking interpretability and having long inference times by introducing a pruning method that produces a reduced model functionally identical to the original, guaranteeing unchanged predictions for any input. The result is a significant reduction in ensemble size without altering model behavior, preserving state-of-the-art performance at a fraction of the original size.
Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a "free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.