ST LG MLMar 19, 2019

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani

arXiv:1903.08560v548.4880 citations

Originality Synthesis-oriented

AI Analysis

This provides theoretical insights into overfitting and generalization in modern machine learning models like neural networks, though it is incremental as it formalizes known empirical observations.

The paper tackles the behavior of minimum norm interpolators in high-dimensional regression, showing that they exhibit double descent risk curves and benefits from overparametrization, with precise quantitative recovery of these phenomena in linear and nonlinear feature models.

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum $\ell_2$ norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors $x_i \in {\mathbb R}^p$ are obtained by applying a linear transform to a vector of i.i.d. entries, $x_i = Σ^{1/2} z_i$ (with $z_i \in {\mathbb R}^p$); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, $x_i = \varphi(W z_i)$ (with $z_i \in {\mathbb R}^d$, $W \in {\mathbb R}^{p \times d}$ a matrix of i.i.d. entries, and $\varphi$ an activation function acting componentwise on $W z_i$). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

View on arXiv PDF

Similar