Optimal ridge regularization revisited
For practitioners using ridge regression, this provides a principled way to set regularization without cross-validation, though it requires knowledge of generative parameters.
The paper presents an iterative procedure to compute the optimal ridge regularization strength from generative parameters, achieving near-optimal generalization across various settings with minimal computational overhead.
We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.