LGOCFeb 9, 2022

On Almost Sure Convergence Rates of Stochastic Gradient Methods

arXiv:2202.04295v256 citations
Originality Incremental advance
AI Analysis

This work addresses the need for trajectory-wise convergence guarantees in stochastic optimization, which is crucial for ensuring reliable algorithm performance in practice, though it is incremental in extending known methods to almost sure analysis.

The paper tackles the problem of analyzing almost sure convergence rates for stochastic gradient methods, showing that for strongly convex functions, the rates are arbitrarily close to optimal, and for non-convex and weakly convex functions, it provides last-iterate convergence results, unlike most existing work that focuses on expectation-based analysis.

The vast majority of convergence rates analysis for stochastic gradient methods in the literature focus on convergence in expectation, whereas trajectory-wise almost sure convergence is clearly important to ensure that any instantiation of the stochastic algorithms would converge with probability one. Here we provide a unified almost sure convergence rates analysis for stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods. We show, for the first time, that the almost sure convergence rates obtained for these stochastic gradient methods on strongly convex functions, are arbitrarily close to their optimal convergence rates possible. For non-convex objective functions, we not only show that a weighted average of the squared gradient norms converges to zero almost surely, but also the last iterates of the algorithms. We further provide last-iterate almost sure convergence rates analysis for stochastic gradient methods on weakly convex smooth functions, in contrast with most existing results in the literature that only provide convergence in expectation for a weighted average of the iterates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes