LGMLApr 6, 2020

Deep Neural Network Learning with Second-Order Optimizers -- a Practical Study with a Stochastic Quasi-Gauss-Newton Method

arXiv:2004.03040v21 citations
AI Analysis

This addresses the challenge of efficient and reliable optimization for deep learning practitioners, though it appears incremental as it combines existing ideas from stochastic quasi-Newton and Gauss-Newton methods.

The authors tackled the problem of computationally demanding and poorly understood convergence in deep learning training by introducing a second-order stochastic quasi-Gauss-Newton (SQGN) method, which achieved excellent accuracy without extensive hyperparameter tuning in tests on MNIST and a seismic tomography application.

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss-Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes