LGOCMLFeb 17, 2023

SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance

arXiv:2302.08783v240 citationsh-index: 31
AI Analysis

This work addresses the need for robust and adaptive optimization methods in machine learning, offering theoretical guarantees for practitioners dealing with unbounded gradients and unknown parameters, though it is incremental in improving analysis rather than introducing a new method.

The paper tackles the limitations of existing analyses for Stochastic Gradient Descent with AdaGrad stepsizes by providing a comprehensive analysis without assumptions on problem parameters, global Lipschitz conditions, or high-probability bounds, achieving sharp convergence rates in both convex and non-convex cases under an affine variance noise model.

We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general ``affine variance'' noise model and provides sharp rates of convergence in both the low-noise and high-noise~regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes