A new regret analysis for Adam-type algorithms
This work resolves a key theoretical limitation for practitioners using adaptive optimization methods in machine learning, though it is incremental in refining existing analysis frameworks.
The paper addresses the theory-practice gap in Adam-type algorithms by showing that constant first-order moment parameters, commonly used in practice, can achieve optimal data-dependent regret bounds without requiring decay schedules, as previously thought necessary in theoretical analyses.
In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $β_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $β_{1}\to0$ schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $β_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.