OCLGAug 3, 2024

Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $α$-PL

arXiv:2408.01839v22 citationsh-index: 56
Originality Incremental advance
AI Analysis

This work provides theoretical convergence guarantees for stochastic optimization algorithms in non-convex settings, which is incremental but important for researchers in optimization and machine learning.

The paper tackles the problem of analyzing the local convergence rates of stochastic first-order methods under a local α-Polyak-Lojasiewicz condition, establishing a lower bound of Ω(ε^{-2/α}) and a matching upper bound of O(ε^{-2/α}) for 1 ≤ α < 2 using a SARAH-type method with time-varying parameters.

We study the local convergence rate of stochastic first-order methods under a local $α$-Polyak-Lojasiewicz ($α$-PL) condition in a neighborhood of a target connected component $\mathcal{M}$ of the local minimizer set. The parameter $α\in [1,2]$ is the exponent of the gradient norm in the $α$-PL inequality: $α=2$ recovers the classical PL case, $α=1$ corresponds to Holder-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output $\hat{x}$ with $F(\hat{x})-l \le \varepsilon$, where $l := F(y)$ for any $y \in \mathcal{M}$. We work in a local regime where the algorithm is initialized near $\mathcal{M}$ and, with high probability, its iterates remain in that neighborhood. We establish a lower bound $Ω(\varepsilon^{-2/α})$ for all stochastic first-order methods in this regime, and we obtain a matching upper bound $\mathcal{O}(\varepsilon^{-2/α})$ for $1 \le α< 2$ via a SARAH-type variance-reduced method with time-varying batch sizes and step sizes. In the convex setting, assuming a local $α$-PL condition on the $\varepsilon$-sublevel set, we further show a complexity lower bound $\widetildeΩ(\varepsilon^{-2/α})$ for reaching an $\varepsilon$-global optimum, matching the $\varepsilon$-dependence of known accelerated stochastic subgradient methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes