LG NE MLDec 23, 2014

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

arXiv:1412.7419v521 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of manual learning rate selection in large-scale machine learning, offering a potentially more robust optimization method for practitioners, though it appears incremental as it builds on existing adaptive techniques.

The paper tackles the problem of tuning learning rates in stochastic gradient descent by proposing ADASECANT, an adaptive learning rate algorithm that uses curvature information from gradient statistics and includes a variance reduction technique. Preliminary experiments with deep neural networks show improved performance over popular stochastic gradient methods.

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information for automatically tuning the learning rates. The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients. We further propose a new variance reduction technique to speed up the convergence. In our preliminary experiments with deep neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.

View on arXiv PDF

Similar