OCLGSep 23, 2024

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

arXiv:2409.14989v238 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in machine learning for researchers, offering incremental improvements in convergence guarantees under generalized smoothness.

The paper tackles optimization for convex functions under the (L0,L1)-smoothness assumption, deriving improved convergence rates for gradient descent with clipping and Polyak stepsizes, and proposing a new accelerated method, with results avoiding exponential dependency on initial conditions.

Due to the non-smoothness of optimization problems in Machine Learning, generalized smoothness assumptions have been gaining a lot of attention in recent years. One of the most popular assumptions of this type is $(L_0,L_1)$-smoothness (Zhang et al., 2020). In this paper, we focus on the class of (strongly) convex $(L_0,L_1)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with (Smoothed) Gradient Clipping and for Gradient Descent with Polyak Stepsizes. In contrast to the existing results, our rates do not rely on the standard smoothness assumption and do not suffer from the exponential dependency from the initial distance to the solution. We also extend these results to the stochastic case under the over-parameterization assumption, propose a new accelerated method for convex $(L_0,L_1)$-smooth optimization, and derive new convergence rates for Adaptive Gradient Descent (Malitsky and Mishchenko, 2020).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes