ML LGSep 6, 2021

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

Yehuda Dar, Vidya Muthukumar, Richard G. Baraniuk

arXiv:2109.02355v126.580 citations

Originality Synthesis-oriented

AI Analysis

This addresses a foundational problem in machine learning theory for researchers, offering insights into overparameterization and generalization, though it is incremental as it synthesizes existing findings.

The paper tackles the puzzle of why overparameterized models, which perfectly fit noisy training data, can generalize well, challenging the traditional bias-variance tradeoff. It provides an overview of recent theoretical advances that explain this through statistical signal processing, highlighting the double descent phenomenon where overparameterized models outperform underparameterized ones.

The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.

View on arXiv PDF

Similar