OC LGJun 14, 2025

Glocal Smoothness: Line Search can really help!

Curtis Fox, Aaron Mishkin, Sharan Vaswani, Mark Schmidt

Stanford

arXiv:2506.12648v114.45 citationsh-index: 17

Originality Incremental advance

AI Analysis

This work addresses a theoretical bottleneck in optimization theory by enabling direct comparison of algorithm complexities, which is incremental but useful for researchers and practitioners in machine learning and optimization.

The paper tackles the problem of comparing iteration complexities of optimization algorithms by introducing a 'glocal smoothness' assumption that depends only on function properties, not algorithm iterates, and shows that line searches can outperform fixed step sizes, with gradient descent achieving better complexity than accelerated methods in some cases.

Iteration complexities for first-order optimization algorithms are typically stated in terms of a global Lipschitz constant of the gradient, and near-optimal results are achieved using fixed step sizes. But many objective functions that arise in practice have regions with small Lipschitz constants where larger step sizes can be used. Many local Lipschitz assumptions have been proposed, which have lead to results showing that adaptive step sizes and/or line searches yield improved convergence rates over fixed step sizes. However, these faster rates tend to depend on the iterates of the algorithm, which makes it difficult to compare the iteration complexities of different methods. We consider a simple characterization of global and local ("glocal") smoothness that only depends on properties of the function. This allows upper bounds on iteration complexities in terms of iterate-independent constants and enables us to compare iteration complexities between algorithms. Under this assumption it is straightforward to show the advantages of line searches over fixed step sizes, and that in some settings, gradient descent with line search has a better iteration complexity than accelerated methods with fixed step sizes. We further show that glocal smoothness can lead to improved complexities for the Polyak and AdGD step sizes, as well other algorithms including coordinate optimization, stochastic gradient methods, accelerated gradient methods, and non-linear conjugate gradient methods.

View on arXiv PDF

Similar