Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning
This work addresses training efficiency for deep learning practitioners, but it is incremental as it combines existing methods.
The paper tackled the problem of slow training in deep neural networks by proposing a conjugate-gradient-based Adam algorithm, which reduced the number of training epochs compared to existing adaptive stochastic optimization methods in text and image classification tasks.
This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis. Numerical experiments on text classification and image classification show that the proposed algorithm can train deep neural network models in fewer epochs than the existing adaptive stochastic optimization algorithms can.