LGJul 24, 2025

Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods

arXiv:2507.18242v21 citationsh-index: 3Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the empirical gap for LP-based boosting methods, offering insights for researchers and practitioners in machine learning, though it is incremental in advancing existing formulations.

The paper tackled the limited empirical evaluation of totally corrective boosting methods by conducting a large-scale study of six LP-based formulations, including two novel ones, across 20 datasets, showing they can match or outperform state-of-the-art heuristics like XGBoost and LightGBM with shallow trees while producing sparser ensembles.

Despite their theoretical appeal, totally corrective boosting methods based on linear programming have received limited empirical attention. In this paper, we conduct the first large-scale experimental study of six LP-based boosting formulations, including two novel methods, NM-Boost and QRLP-Boost, across 20 diverse datasets. We evaluate the use of both heuristic and optimal base learners within these formulations, and analyze not only accuracy, but also ensemble sparsity, margin distribution, anytime performance, and hyperparameter sensitivity. We show that totally corrective methods can outperform or match state-of-the-art heuristics like XGBoost and LightGBM when using shallow trees, while producing significantly sparser ensembles. We further show that these methods can thin pre-trained ensembles without sacrificing performance, and we highlight both the strengths and limitations of using optimal decision trees in this context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes