PINE: Pruning Boosted Tree Ensembles with Conformal In-Distribution Prediction Equivalence
For practitioners using tree ensembles on tabular data, PINE offers a way to compress models without compromising decision consistency within the data distribution, addressing a key limitation of prior faithful pruning methods.
PINE introduces a pruning method for tree ensembles that guarantees prediction equivalence within an in-distribution region controlled by a conformal parameter α, achieving up to 30% higher compression ratios than existing faithful pruning methods while maintaining comparable prediction preservation.
Tree ensembles are machine learning models with strong predictive performance and interpretability, and remain widely used for tabular data. Standard pruning methods for tree ensembles typically optimize an accuracy-compression trade-off and may change a subset of predictions, potentially compromising decision consistency. Faithful pruning methods address this issue by preserving prediction equivalence over the entire input space, but this requirement leads to lower compression ratios. We propose PINE, a pruning method that provides strong guarantees within an in-distribution region. PINE preserves prediction equivalence within this region and controls the region size using a single parameter $α$ via conformal calibration. Experiments on 12 public tabular datasets show that PINE improves the compression ratio by up to $30\%$ while preserving predictions at a comparable level to existing faithful pruning methods.