Ridge Regression: Structure, Cross-Validation, and Sketching
This work addresses methodological improvements for ridge regression, which is widely used in statistics and machine learning for regularization, but it is incremental as it builds on existing theory with specific refinements.
The paper tackles three fundamental problems in ridge regression: understanding the estimator's structure, correcting bias in cross-validation for regularization parameter selection, and analyzing the accuracy of sketching methods for computational acceleration. It provides precise theoretical results showing that sketching methods are surprisingly accurate and proposes a simple bias-correction for cross-validation.
We study the following three fundamental problems about ridge regression: (1) what is the structure of the estimator? (2) how to correctly use cross-validation to choose the regularization parameter? and (3) how to accelerate computation without losing too much accuracy? We consider the three problems in a unified large-data linear model. We give a precise representation of ridge regression as a covariance matrix-dependent linear combination of the true parameter and the noise. We study the bias of $K$-fold cross-validation for choosing the regularization parameter, and propose a simple bias-correction. We analyze the accuracy of primal and dual sketching for ridge regression, showing they are surprisingly accurate. Our results are illustrated by simulations and by analyzing empirical data.