Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning
This work addresses optimization efficiency for deep learning practitioners, but it is incremental as it modifies an existing algorithm (Adam) with a known technique (conjugate gradient).
The authors tackled the challenge of training deep neural networks by proposing CG-like-Adam, an optimization algorithm that integrates a conjugate-gradient-like method into Adam to speed up training and enhance performance, with numerical experiments on CIFAR10/100 datasets showing its superiority.
Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.