MLLGFeb 13

A Regularization-Sharpness Tradeoff for Linear Interpolators

arXiv:2602.12680v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses the need for new theoretical frameworks in overparameterized machine learning, though it builds directly on prior analyses of ridge regularizers.

The authors tackled the breakdown of the bias-variance tradeoff in overparameterized linear regression by proposing a regularization-sharpness tradeoff for linear interpolators with ℓ^p penalties, showing how this tradeoff can distinguish performant from weaker interpolators on real-world datasets.

The rule of thumb regarding the relationship between the bias-variance tradeoff and model size plays a key role in classical machine learning, but is now well-known to break down in the overparameterized setting as per the double descent curve. In particular, minimum-norm interpolating estimators can perform well, suggesting the need for new tradeoff in these settings. Accordingly, we propose a regularization-sharpness tradeoff for overparameterized linear regression with an $\ell^p$ penalty. Inspired by the interpolating information criterion, our framework decomposes the selection penalty into a regularization term (quantifying the alignment of the regularizer and the interpolator) and a geometric sharpness term on the interpolating manifold (quantifying the effect of local perturbations), yielding a tradeoff analogous to bias-variance. Building on prior analyses that established this information criterion for ridge regularizers, this work first provides a general expression of the interpolating information criterion for $\ell^p$ regularizers where $p \ge 2$. Subsequently, we extend this to the LASSO interpolator with $\ell^1$ regularizer, which induces stronger sparsity. Empirical results on real-world datasets with random Fourier features and polynomials validate our theory, demonstrating how the tradeoff terms can distinguish performant linear interpolators from weaker ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes