LG MLSep 9, 2019

A Stochastic Quasi-Newton Method with Nesterov's Accelerated Gradient

S. Indrapriyadarsini, Shahrzad Mahboubi, Hiroshi Ninomiya, Hideki Asai

arXiv:1909.03621v110 citations

Originality Incremental advance

AI Analysis

This work addresses optimization efficiency for neural network training, but it appears incremental as it combines existing techniques like quasi-Newton and Nesterov acceleration.

The paper tackled the problem of improving convergence in large-scale non-convex optimization for neural networks by proposing a stochastic quasi-Newton method with Nesterov's accelerated gradient, and the results showed improved performance over classical second-order and popular first-order methods like SGD and Adam in benchmark classification and regression tasks.

Incorporating second order curvature information in gradient based methods have shown to improve convergence drastically despite its computational intensity. In this paper, we propose a stochastic (online) quasi-Newton method with Nesterov's accelerated gradient in both its full and limited memory forms for solving large scale non-convex optimization problems in neural networks. The performance of the proposed algorithm is evaluated in Tensorflow on benchmark classification and regression problems. The results show improved performance compared to the classical second order oBFGS and oLBFGS methods and popular first order stochastic methods such as SGD and Adam. The performance with different momentum rates and batch sizes have also been illustrated.

View on arXiv PDF

Similar