Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
This addresses the limitation of unrealistic error assumptions in prior research for statisticians and machine learning practitioners, though it is incremental by extending existing analyses to more general settings.
The paper tackles the problem of analyzing prediction and estimation risks for ridgeless least squares estimators under more general regression error assumptions, such as clustered or serial dependence, and finds that the benefits of overparameterization can extend to time series, panel, and grouped data, with risks summarized through the trace of the error variance-covariance matrix.
In recent years, there has been a significant growth in research focusing on minimum $\ell_2$ norm (ridgeless) interpolation least squares estimators. However, the majority of these analyses have been limited to an unrealistic regression error structure, assuming independent and identically distributed errors with zero mean and common variance. In this paper, we explore prediction risk as well as estimation risk under more general regression error assumptions, highlighting the benefits of overparameterization in a more realistic setting that allows for clustered or serial dependence. Notably, we establish that the estimation difficulties associated with the variance components of both risks can be summarized through the trace of the variance-covariance matrix of the regression errors. Our findings suggest that the benefits of overparameterization can extend to time series, panel and grouped data.