OCLGSep 29, 2024

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

arXiv:2409.19791v29 citationsh-index: 33
Originality Highly original
AI Analysis

This work addresses a fundamental problem in optimization theory for researchers and practitioners, offering a novel approach to convergence under weaker conditions, though it is incremental in refining existing gradient descent methods.

The paper challenges the belief that linear convergence of gradient descent requires quadratic growth, showing that gradient descent with an adaptive stepsize achieves nearly linear convergence under fourth-order growth conditions. It demonstrates this on problems like matrix sensing and learning a single neuron, with concrete convergence rates.

A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes