LGAIMLMar 11, 2019

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

arXiv:1903.03614v158 citations
Originality Synthesis-oriented
AI Analysis

It offers a tutorial overview for practitioners, but is incremental as it synthesizes existing methods without new results.

This paper provides an introduction to gradient descent optimization algorithms for training deep neural networks, covering conventional methods and variants like Momentum, Adagrad, Adam, and Gadam.

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to train. Nowadays, most of the deep learning model training still relies on the back propagation algorithm actually. In back propagation, the model variables will be updated iteratively until convergence with gradient descent based optimization algorithms. Besides the conventional vanilla gradient descent algorithm, many gradient descent variants have also been proposed in recent years to improve the learning performance, including Momentum, Adagrad, Adam, Gadam, etc., which will all be introduced in this paper respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes