ML LG STFeb 20, 2022

Memorize to Generalize: on the Necessity of Interpolation in High Dimensional Linear Regression

Chen Cheng, John Duchi, Rohith Kuditipudi

arXiv:2202.09889v211.614 citations

Originality Incremental advance

AI Analysis

This addresses a foundational problem in machine learning theory about overparameterization and generalization, with implications for model design and training.

The paper tackles the necessity of interpolation in overparameterized linear regression, showing that optimal predictive risk requires nearly interpolating training data, with excess prediction error growing linearly if training error exceeds a threshold.

We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple overparameterized linear regression $y = X θ+ w$ with random design $X \in \mathbb{R}^{n \times d}$ under the proportional asymptotics $d/n \to γ\in (1, \infty)$. We precisely characterize how prediction (test) error necessarily scales with training error in this setting. An implication of this characterization is that as the label noise variance $σ^2 \to 0$, any estimator that incurs at least $\mathsf{c}σ^4$ training error for some constant $\mathsf{c}$ is necessarily suboptimal and will suffer growth in excess prediction error at least linear in the training error. Thus, optimal performance requires fitting training data to substantially higher accuracy than the inherent noise floor of the problem.

View on arXiv PDF

Similar