LGMLJun 5, 2019

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

arXiv:1906.02353v142 citations
Originality Incremental advance
AI Analysis

This work addresses optimization bottlenecks in deep learning for researchers and practitioners, offering incremental improvements to existing methods.

The paper tackles the challenge of training deep neural networks with large datasets by proposing efficient Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods, using subsampled matrices and gradients, and demonstrates their effectiveness through numerical results.

We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our methods use subsampled Gauss-Newton or Fisher information matrices and either subsampled gradient estimates (fully stochastic) or full gradients (semi-stochastic), which, in the latter case, we prove convergent to a stationary point. By using the Sherman-Morrison-Woodbury formula with automatic differentiation (backpropagation) we show how our methods can be implemented to perform efficiently. Finally, numerical results are presented to demonstrate the effectiveness of our proposed methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes