LG MLFeb 9

All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension

arXiv:2602.08350v11.4h-index: 2

Originality Highly original

AI Analysis

This addresses foundational issues in machine learning theory by revealing limitations of standard optimization methods, with implications for algorithm design and generalization analysis.

The paper tackles the problem of sample complexity in stochastic convex optimization by showing that Empirical Risk Minimizers (ERMs) can overfit with linear dimension sample sizes, resolving an open question, and provides a generalization lower bound for Gradient Descent that narrows the gap with known upper bounds.

We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit. This resolves an open question by Feldman. We also extend this to approximate ERMs. Building on our construction we also show that (constrained) Gradient Descent potentially overfits when horizon and learning rate grow w.r.t sample size. Specifically we provide a novel generalization lower bound of $Ω\left(\sqrt{ηT/m^{1.5}}\right)$ for Gradient Descent, where $η$ is the learning rate, $T$ is the horizon and $m$ is the sample size. This narrows down, exponentially, the gap between the best known upper bound of $O(ηT/m)$ and existing lower bounds from previous constructions.

View on arXiv PDF

Similar