LGMLMay 23, 2024

Exact Gauss-Newton Optimization for Training Deep Neural Networks

arXiv:2405.14402v212 citationsh-index: 4Neurocomputing
Originality Incremental advance
AI Analysis

This work addresses optimization bottlenecks for large-scale machine learning practitioners, offering an incremental improvement over existing second-order methods.

The paper tackles the problem of training deep neural networks efficiently by proposing Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that uses low-rank linear algebra to compute descent directions, and it demonstrates that EGN consistently matches or exceeds the generalization performance of well-tuned optimizers like SGD and Adam across various tasks.

We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes