OCLGNAMLOct 21, 2019

Adaptive Gradient Descent without Descent

arXiv:1910.09529v2179 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of simplifying and generalizing gradient descent optimization for machine learning practitioners, though it appears incremental by building on existing adaptive methods.

The paper tackles the problem of automating gradient descent by proposing a simple method that adapts to local geometry without needing function values or line searches, achieving convergence for convex problems even with infinite global smoothness, as demonstrated on logistic regression and matrix factorization.

We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes