OC LGApr 3, 2024

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Aaron Mishkin, Mert Pilanci, Mark Schmidt

Stanford

arXiv:2404.02378v25.62 citationsh-index: 25

Originality Incremental advance

AI Analysis

This work addresses the issue of potentially worse guarantees for stochastic acceleration compared to SGD, offering incremental improvements for optimization in machine learning.

The paper tackles the problem of improving convergence rates for stochastic accelerated gradient descent under interpolation conditions, achieving a reduction in dependence on the strong growth constant from ρ to √ρ, which is comparable to a square-root improvement in the worst-case condition number.

We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from $ρ$ to $\sqrtρ$ as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

View on arXiv PDF

Similar