Error Reduction from Stacked Regressions
This work addresses the challenge of enhancing regression ensemble performance for statisticians and data scientists, offering a computationally efficient alternative with proven theoretical guarantees, though it is incremental as it builds on existing stacking techniques.
The paper tackles the problem of improving predictive accuracy in regression ensembles by proposing a new method for learning combination weights through regularized empirical risk minimization with nonnegativity constraints, showing that the resulting stacked estimator achieves strictly smaller population risk than the best single estimator, with gains up to 30% in low signal-to-noise scenarios.
Stacking regressions is an ensemble technique that forms linear combinations of different regression estimators to enhance predictive accuracy. The conventional approach uses cross-validation data to generate predictions from the constituent estimators, and least-squares with nonnegativity constraints to learn the combination weights. In this paper, we learn these weights analogously by minimizing a regularized version of the empirical risk subject to a nonnegativity constraint. When the constituent estimators are linear least-squares projections onto nested subspaces separated by at least three dimensions, we show that thanks to an adaptive shrinkage effect, the resulting stacked estimator has strictly smaller population risk than best single estimator among them, with more significant gains when the signal-to-noise ratio is small. Here "best" refers to an estimator that minimizes a model selection criterion such as AIC or BIC. In other words, in this setting, the best single estimator is inadmissible. Because the optimization problem can be reformulated as isotonic regression, the stacked estimator requires the same order of computation as the best single estimator, making it an attractive alternative in terms of both performance and implementation.