MLLGCADSOCOct 4, 2018

Convergence and Dynamical Behavior of the ADAM Algorithm for Non-Convex Stochastic Optimization

arXiv:1810.02263v4113 citations
Originality Highly original
AI Analysis

This provides theoretical guarantees for a widely used optimization algorithm in machine learning, addressing a known bottleneck in non-convex settings.

The paper establishes convergence of the ADAM algorithm for non-convex stochastic optimization, proving that iterates converge to stationary points under constant stepsize and introducing a novel decreasing stepsize version with almost sure convergence to critical points.

Adam is a popular variant of stochastic gradient descent for finding a local minimizer of a function. In the constant stepsize regime, assuming that the objective function is differentiable and non-convex, we establish the convergence in the long run of the iterates to a stationary point under a stability condition. The key ingredient is the introduction of a continuous-time version of Adam, under the form of a non-autonomous ordinary differential equation. This continuous-time system is a relevant approximation of the Adam iterates, in the sense that the interpolated Adam process converges weakly towards the solution to the ODE. The existence and the uniqueness of the solution are established. We further show the convergence of the solution towards the critical points of the objective function and quantify its convergence rate under a Lojasiewicz assumption. Then, we introduce a novel decreasing stepsize version of Adam. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes