LGCLCVJul 3, 2024

Gradient descent with generalized Newton's method

arXiv:2407.02772v38 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the time-consuming learning rate tuning problem for machine learning practitioners, though it appears incremental as it builds on existing optimizers like SGD and Adam.

The authors tackled the problem of intensive learning rate tuning in optimization by proposing the generalized Newton's method (GeN), a Hessian-informed approach that automatically selects learning rates to accelerate convergence. Their experiments on language and vision tasks showed that GeN optimizers match state-of-the-art performance achieved with carefully tuned schedulers.

We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, our method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes