Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
This addresses the problem of reliable model evaluation and uncertainty quantification for practitioners using early-stopped gradient descent in high-dimensional settings, offering a rigorous theoretical foundation for cross-validation choices.
The paper analyzes cross-validation methods for early-stopped gradient descent in high-dimensional regression, proving that generalized cross-validation is inconsistent for risk estimation while leave-one-out cross-validation converges uniformly to the prediction risk, enabling consistent estimators for error distributions and prediction intervals with correct coverage.
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features. In contrast, we show that LOOCV converges uniformly along the GD trajectory to the prediction risk. Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear. Furthermore, by leveraging the individual LOOCV errors, we construct consistent estimators for the entire prediction error distribution along the GD trajectory and consistent estimators for a wide class of error functionals. This in particular enables the construction of pathwise prediction intervals based on GD iterates that have asymptotically correct nominal coverage conditional on the training data.