OCPRMLDec 7, 2020

Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance

arXiv:2012.04002v316 citations
Originality Highly original
AI Analysis

This work provides theoretical convergence guarantees and trap avoidance for a broad class of stochastic optimization algorithms, which is significant for researchers and practitioners using these methods, especially S-NAG in non-convex settings.

This paper analyzes a general stochastic optimization procedure that unifies several stochastic gradient descent variants, including stochastic heavy ball, S-NAG, and Adam. It establishes the stability and almost sure convergence of iterates to critical points for non-convex differentiable objective functions, and provides a convergence rate via a Central Limit Theorem. The algorithm is also shown to avoid undesired critical points like local maxima or saddle points.

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes