From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators
This provides a more fine-grained analysis of predictive performance for statisticians and machine learning practitioners, though it is incremental as it builds on existing theory.
The paper tackles the problem of understanding the asymptotic risk of regularized estimators tuned by cross-validation, showing that their out-of-sample prediction loss converges to the risk function of shrinkage estimators tuned by SURE, which quantifies risk variation with the true parameter.
We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.