ST IT LG SP MLOct 18, 2021

Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple descent

arXiv:2110.09502v18.69 citations

Originality Incremental advance

AI Analysis

It addresses the problem of understanding interpolation in over-parameterized regimes for researchers in machine learning theory, providing rigorous insights into risk behavior, though it is incremental in building on existing work on interpolators.

This paper tackles the theoretical understanding of minimum ℓ₁-norm interpolators in noisy sparse regression under Gaussian design, showing that their generalization risk exhibits a multi-descent phenomenon with phases of descent and ascent as model capacity increases, based on an exact characterization of risk behavior.

An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$-norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size). We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum $\ell_1$-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.

View on arXiv PDF

Similar