On the Trend-corrected Variant of Adaptive Stochastic Optimization Methods
This work addresses convergence speed issues in deep learning optimization for practitioners using adaptive methods, but it is incremental as it builds on existing Adam-type frameworks.
The paper tackles the problem of slow convergence in Adam-type optimizers by introducing a trend-corrected variant that incorporates trend information into parameter updates, resulting in consistently faster convergence compared to conventional Adam and AMSGrad on classical models with real-world datasets.
Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on large-scale sparse datasets with high computational efficiency. In this paper, we present a new framework for Adam-type methods with the trend information when updating the parameters with the adaptive step size and gradients. The additional terms in the algorithm promise an efficient movement on the complex cost surface, and thus the loss would converge more rapidly. We show empirically the importance of adding the trend component, where our framework outperforms the conventional Adam and AMSGrad methods constantly on the classical models with several real-world datasets.