LGOCMLAug 16, 2018

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

arXiv:1808.05671v4162 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights for researchers and practitioners using adaptive gradient methods in deep learning, though it is incremental as it builds on existing analysis.

The paper tackles the problem of convergence guarantees for adaptive gradient methods like AMSGrad, RMSProp, and AdaGrad in nonconvex optimization, proving they converge to a first-order stationary point with a better rate in terms of dimension and establishing high probability bounds for the first time.

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes