LGOCFeb 27, 2022

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization

arXiv:2202.13328v27 citations
Originality Incremental advance
AI Analysis

This provides an efficient algorithm for machine learning practitioners dealing with linear prediction tasks, though it is incremental as it builds on prior work by improving iteration complexity in a specific setting.

The paper tackles the problem of stochastic convex optimization for generalized linear models, showing that early stopped Gradient Descent achieves excess error ε with optimal sample and iteration complexities of Õ(1/ε²), compared to Ω(1/ε⁴) in general settings.

We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $ε$ (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of $\tilde{O}(1/ε^2)$ and only $\tilde{O}(1/ε^2)$ iterations. This contrasts with general stochastic convex optimization, where $Ω(1/ε^4)$ iterations are needed Amir et al. [2021b]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Θ(1/ε^4)$ samples, we rely on uniform convergence in a distribution-dependent ball.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes