Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
This work addresses the limitation of existing accelerated methods in adapting to local curvature for convex optimization, which is important for researchers and practitioners in optimization and machine learning, though it is incremental as it builds on prior adaptive and accelerated methods.
The paper tackles the problem of accelerating the adaptive gradient method GRAAL to match the optimal convergence rate of Nesterov's accelerated gradient descent, while maintaining its ability to adapt to local curvature without hyperparameter tuning. The result is a new algorithm, GRAAL with Nesterov acceleration, which achieves near-optimal iteration complexities for L-smooth functions and under a more general smoothness assumption.
In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function, $\min_x f(x)$. Recently, Malitsky (2020); Alacaoglu et al.(2023) developed an adaptive first-order method, GRAAL. This algorithm computes stepsizes by estimating the local curvature of the objective function without any line search procedures or hyperparameter tuning, and attains the standard iteration complexity $\mathcal{O}(L\lVert x_0-x^*\rVert^2/ε)$ of fixed-stepsize gradient descent for $L$-smooth functions. However, a natural question arises: is it possible to accelerate the convergence of GRAAL to match the optimal complexity $\mathcal{O}(\sqrt{L\lVert x_0-x^*\rVert^2/ε})$ of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made by Li and Lan (2025); Suh and Ma (2025), the ability of existing accelerated algorithms to adapt to the local curvature of the objective function is highly limited. We resolve this issue and develop GRAAL with Nesterov acceleration, which can adapt its stepsize to the local curvature at a geometric, or linear, rate just like non-accelerated GRAAL. We demonstrate the adaptive capabilities of our algorithm by proving that it achieves near-optimal iteration complexities for $L$-smooth functions, as well as under a more general $(L_0,L_1)$-smoothness assumption (Zhang et al., 2019).