LGMLOct 16, 2018

Quasi-hyperbolic momentum and Adam for deep learning

arXiv:1810.06801v4155 citations
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for deep learning practitioners, offering incremental improvements with practical simplicity.

The authors tackled the problem of improving momentum-based acceleration in stochastic gradient descent for deep learning by proposing Quasi-Hyperbolic Momentum (QHM) and its Adam variant QHAdam, achieving a new state-of-the-art result on WMT16 EN-DE.

Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes