LGMLOct 27, 2019

An Adaptive and Momental Bound Method for Stochastic Learning

arXiv:1910.12249v154 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a specific issue in stochastic optimization for deep learning, offering an incremental improvement over existing adaptive methods.

The authors tackled the problem of adaptive learning rate methods like Adam producing extremely large learning rates at the start of training, which inhibits learning, by proposing AdaMod, a method that restricts adaptive learning rates with adaptive and momental upper bounds. Their experiments showed that AdaMod eliminates these large learning rates and brings significant improvements on complex networks such as DenseNet and Transformer compared to Adam.

Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-convergence issues and so on. Alternative variants have been proposed for enhancement, such as AMSGrad, AdaShift and AdaBound. In this work, we identify a new problem of adaptive learning rate methods that exhibits at the beginning of learning where Adam produces extremely large learning rates that inhibit the start of learning. We propose the Adaptive and Momental Bound (AdaMod) method to restrict the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks. Our experiments verify that AdaMod eliminates the extremely large learning rates throughout the training and brings significant improvements especially on complex networks such as DenseNet and Transformer, compared to Adam. Our implementation is available at: https://github.com/lancopku/AdaMod

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes