LGMLOct 17, 2019

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

arXiv:1910.07939v17 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for training large-scale neural networks, addressing a specific bottleneck in optimization methods.

The paper tackled the problem of high stochastic variance noise in a stochastic version of the Nesterov's Accelerated Quasi-Newton method for training neural networks, proposing a stochastic variance reduced version that showed improved performance on four benchmark problems.

Recently algorithms incorporating second order curvature information have become popular in training neural networks. The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to effectively accelerate the BFGS quasi-Newton method by incorporating the momentum term and Nesterov's accelerated gradient vector. A stochastic version of NAQ method was proposed for training of large-scale problems. However, this method incurs high stochastic variance noise. This paper proposes a stochastic variance reduced Nesterov's Accelerated Quasi-Newton method in full (SVR-NAQ) and limited (SVRLNAQ) memory forms. The performance of the proposed method is evaluated in Tensorflow on four benchmark problems - two regression and two classification problems respectively. The results show improved performance compared to conventional methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes