OCLGNov 15, 2023

Non-Uniform Smoothness for Gradient Descent

arXiv:2311.08615v16 citationsh-index: 18
Originality Highly original
AI Analysis

This work addresses the hyperparameter tuning bottleneck in optimization for machine learning practitioners, offering a novel method with theoretical improvements.

The authors tackled the problem of expensive hyperparameter tuning in gradient descent by introducing a local first-order smoothness oracle (LFSO) that generalizes Lipschitz continuity and encodes problem information for stepsize tuning, resulting in global linear convergence rates for non-strongly convex problems with flat minima, improving over lower bounds for general first-order methods.

The analysis of gradient descent-type methods typically relies on the Lipschitz continuity of the objective gradient. This generally requires an expensive hyperparameter tuning process to appropriately calibrate a stepsize for a given problem. In this work we introduce a local first-order smoothness oracle (LFSO) which generalizes the Lipschitz continuous gradients smoothness condition and is applicable to any twice-differentiable function. We show that this oracle can encode all relevant problem information for tuning stepsizes for a suitably modified gradient descent method and give global and local convergence results. We also show that LFSOs in this modified first-order method can yield global linear convergence rates for non-strongly convex problems with extremely flat minima, and thus improve over the lower bound on rates achievable by general (accelerated) first-order methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes