LGMLOct 23, 2023

Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

arXiv:2310.14901v21 citationsh-index: 10
AI Analysis

This addresses the computational burden of Hessian-based optimization for non-convex problems in machine learning, offering a scalable solution for practitioners, though it appears incremental as it builds on Saddle-Free Newton methods.

The paper tackles the challenge of applying second-order optimization methods to neural networks by proposing a scalable algorithm that asymptotically uses the exact inverse Hessian with absolute-value eigenvalues, demonstrating comparable performance to existing methods in settings like ResNet-18 on CIFAR-10.

Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact inverse Hessian with absolute-value eigenvalues. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes