OCLGMLJun 2, 2023

Convex and Non-convex Optimization Under Generalized Smoothness

arXiv:2306.01264v277 citationsh-index: 67
AI Analysis

This work provides a more flexible theoretical framework for optimization algorithms, benefiting researchers and practitioners in machine learning and optimization by handling broader function classes and noise conditions.

The paper tackles the problem of optimizing functions under generalized smoothness conditions beyond classical Lipschitz gradients, developing a new analysis technique that bounds gradients along trajectories to achieve classical convergence rates for gradient descent and Nesterov's accelerated gradient in both convex and non-convex settings, without requiring gradient clipping and allowing heavy-tailed noise.

Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes