LGOCOct 3, 2025

How to Set $β_1, β_2$ in Adam: An Online Learning Perspective

arXiv:2510.03478v11 citationsh-index: 1
AI Analysis

This work provides theoretical insights for machine learning practitioners on tuning Adam parameters, though it is incremental as it builds on prior online learning frameworks.

The paper tackles the problem of optimally setting momentum factors β₁ and β₂ in the Adam optimizer, deriving novel theoretical analyses that generalize existing bounds for cases where β₁ ≠ √β₂ and prove tightness in worst-case scenarios.

While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $β_1$ and $β_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $β_1 = \sqrt{β_2}$, which does not cover the more practical cases with $β_1 \neq \sqrt{β_2}$. We derive novel, more general analyses that hold for both $β_1 \geq \sqrt{β_2}$ and $β_1 \leq \sqrt{β_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $β_1 = \sqrt{β_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes