LG MLOct 17, 2019

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Sota Yasuda, Shahrzad Mahboubi, S. Indrapriyadarsini, Hiroshi Ninomiya, Hideki Asai

arXiv:1910.07939v17 citations

Originality Incremental advance

AI Analysis

This is an incremental improvement for training large-scale neural networks, addressing a specific bottleneck in optimization methods.

The paper tackled the problem of high stochastic variance noise in a stochastic version of the Nesterov's Accelerated Quasi-Newton method for training neural networks, proposing a stochastic variance reduced version that showed improved performance on four benchmark problems.

Recently algorithms incorporating second order curvature information have become popular in training neural networks. The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to effectively accelerate the BFGS quasi-Newton method by incorporating the momentum term and Nesterov's accelerated gradient vector. A stochastic version of NAQ method was proposed for training of large-scale problems. However, this method incurs high stochastic variance noise. This paper proposes a stochastic variance reduced Nesterov's Accelerated Quasi-Newton method in full (SVR-NAQ) and limited (SVRLNAQ) memory forms. The performance of the proposed method is evaluated in Tensorflow on four benchmark problems - two regression and two classification problems respectively. The results show improved performance compared to conventional methods.

View on arXiv PDF

Similar