LGNov 25, 2025

A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent

Shuo Xie, Tianhao Wang, Beining Wu, Zhiyuan Li

arXiv:2511.20584v14 citations

Originality Highly original

AI Analysis

This work provides foundational insights into optimization theory, bridging adaptive and non-Euclidean methods for researchers in machine learning and optimization.

The paper tackles the theoretical understanding of adaptive optimizers by extending the concept of adaptive smoothness to nonconvex settings, showing it characterizes their convergence, and proving it enables acceleration with Nesterov momentum in convex cases, which is unattainable under standard smoothness for certain geometries.

Adaptive optimizers can reduce to normalized steepest descent (NSD) when only adapting to the current gradient, suggesting a close connection between the two algorithmic families. A key distinction between their analyses, however, lies in the geometries, e.g., smoothness notions, they rely on. In the convex setting, adaptive optimizers are governed by a stronger adaptive smoothness condition, while NSD relies on the standard notion of smoothness. We extend the theory of adaptive smoothness to the nonconvex setting and show that it precisely characterizes the convergence of adaptive optimizers. Moreover, we establish that adaptive smoothness enables acceleration of adaptive optimizers with Nesterov momentum in the convex setting, a guarantee unattainable under standard smoothness for certain non-Euclidean geometry. We further develop an analogous comparison for stochastic optimization by introducing adaptive gradient variance, which parallels adaptive smoothness and leads to dimension-free convergence guarantees that cannot be achieved under standard gradient variance for certain non-Euclidean geometry.

View on arXiv PDF

Similar