Mirror Descent Under Generalized Smoothness
This work addresses a foundational problem in optimization theory by enabling fast convergence rates for non-smooth objectives in non-Euclidean geometries, which is incremental but broadens applicability in machine learning.
The paper tackles the limitation of existing smoothness generalizations to Euclidean geometry by introducing a new $\ell*$-smoothness concept for general norms, establishing convergence rates for mirror-descent-type algorithms that match classic smoothness results, and extending this to stochastic optimization with a new bounded noise condition.
Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by allowing the Lipschitz constant of the gradient to grow with respect to the gradient norm, which accommodates a broad range of objectives in practice. Despite this progress, existing generalizations of smoothness are restricted to Euclidean geometry with $\ell_2$-norm and only have theoretical guarantees for optimization in the Euclidean space. In this paper, we address this limitation by introducing a new $\ell*$-smoothness concept that measures the norm of Hessians in terms of a general norm and its dual, and establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness. Notably, we propose a generalized self-bounding property that facilitates bounding the gradients via controlling suboptimality gaps, serving as a principal component for convergence analysis. Beyond deterministic optimization, we establish an anytime convergence for stochastic mirror descent based on a new bounded noise condition that encompasses the widely adopted bounded or affine noise assumptions.