LG MLNov 5, 2024

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, Yutaka Matsuo

arXiv:2411.02853v318.222 citationsh-index: 20Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses a foundational problem in deep learning optimization by providing a theoretically sound and practical alternative to Adam, though it is an incremental improvement over existing variants.

The paper tackles the theoretical non-convergence issue of the Adam optimizer by proposing ADOPT, a modified version that achieves an optimal convergence rate of O(1/√T) with any β₂ parameter, without relying on impractical bounded noise assumptions, and demonstrates superior performance in experiments across tasks like image classification and NLP.

Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $\mathcal{O} ( 1 / \sqrt{T} )$ with any choice of $β_2$ without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning. The implementation is available at https://github.com/iShohei220/adopt.

View on arXiv PDF Code

Similar