STLGMLApr 25, 2023

Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

arXiv:2304.13016v212 citationsh-index: 17
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for efficient ensemble tuning in high-dimensional regression, though it is incremental as it builds on existing ridge and subsampling methods.

The paper tackles the problem of tuning subsample-based ridge ensembles by analyzing their prediction risk in proportional asymptotics, proving that the optimal full ridgeless ensemble matches the risk of optimal ridge predictors. It also shows strong uniform consistency of generalized cross-validation for risk estimation, enabling tuning without sample splitting.

We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(λ, φ_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes