MLLGSTMESep 27, 2025

Statistical Inference for Gradient Boosting Regression

arXiv:2509.23127v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of statistical inference for gradient boosting, which is important for practitioners needing reliable uncertainty estimates in regression tasks, though it appears incremental as it builds on existing regularization methods.

The authors tackled the problem of statistical inference and uncertainty quantification in gradient boosting regression by proposing a unified framework that integrates dropout or parallel training with regularization, resulting in enhanced signal recovery and performance, and enabling the construction of confidence intervals, prediction intervals, and hypothesis tests for variable importance.

Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprisingly find that increasing the dropout rate and the number of trees grown in parallel at each iteration substantially enhances signal recovery and overall performance. Our resulting algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance. Numerical experiments demonstrate that our algorithms perform well, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes