LGAIITJul 8, 2025

Simple Convergence Proof of Adam From a Sign-like Descent Perspective

arXiv:2507.05966v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work provides a more accessible and rigorous theoretical foundation for Adam, benefiting researchers and practitioners in deep learning by clarifying convergence properties and offering tuning guidelines.

The paper tackles the problem of simplifying the theoretical convergence analysis of the Adam optimizer by proposing a novel sign-like interpretation, proving that Adam achieves an optimal convergence rate of O(1/T^{1/4}) under mild conditions, compared to previous O(ln T/T^{1/4}).

Adam is widely recognized as one of the most effective optimizers for training deep neural networks (DNNs). Despite its remarkable empirical success, its theoretical convergence analysis remains unsatisfactory. Existing works predominantly interpret Adam as a preconditioned stochastic gradient descent with momentum (SGDM), formulated as $\bm{x}_{t+1} = \bm{x}_t - \frac{γ_t}{{\sqrt{\bm{v}_t}+ε}} \circ \bm{m}_t$. This perspective necessitates strong assumptions and intricate techniques, resulting in lengthy and opaque convergence proofs that are difficult to verify and extend. In contrast, we propose a novel interpretation by treating Adam as a sign-like optimizer, expressed as $\bm{x}_{t+1} = \bm{x}_t - γ_t \frac{|\bm{m}_t|}{{\sqrt{\bm{v}_t}+ε}} \circ {\rm Sign}(\bm{m}_t)$. This reformulation significantly simplifies the convergence analysis. For the first time, with some mild conditions, we prove that Adam achieves the optimal rate of ${\cal O}(\frac{1}{T^{\sfrac{1}{4}}})$ rather than the previous ${\cal O} \left(\frac{\ln T}{T^{\sfrac{1}{4}}}\right)$ under weak assumptions of the generalized $p$-affine variance and $(L_0, L_1, q)$-smoothness, without dependence on the model dimensionality or the numerical stability parameter $ε$. Additionally, our theoretical analysis provides new insights into the role of momentum as a key factor ensuring convergence and offers practical guidelines for tuning learning rates in Adam, further bridging the gap between theory and practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes