Finding Local Minima via Stochastic Nested Variance Reduction
This work addresses optimization efficiency for machine learning practitioners, offering incremental improvements in algorithm speed for nonconvex settings.
The paper tackles the problem of finding local minima in nonconvex optimization by proposing two algorithms that achieve faster convergence than state-of-the-art methods, with specific gradient complexity improvements such as [3mO[0m(n^{1/2}ε^{-2}+nε_H^{-3}+n^{3/4}ε_H^{-7/2}) for finite-sum problems.
We propose two algorithms that can find local minima faster than the state-of-the-art algorithms in both finite-sum and general stochastic nonconvex optimization. At the core of the proposed algorithms is $\text{One-epoch-SNVRG}^+$ using stochastic nested variance reduction (Zhou et al., 2018a), which outperforms the state-of-the-art variance reduction algorithms such as SCSG (Lei et al., 2017). In particular, for finite-sum optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{finite}}$ algorithm achieves $\tilde{O}(n^{1/2}ε^{-2}+nε_H^{-3}+n^{3/4}ε_H^{-7/2})$ gradient complexity to converge to an $(ε, ε_H)$-second-order stationary point, which outperforms $\text{SVRG}+\text{Neon2}^{\text{finite}}$ (Allen-Zhu and Li, 2017) , the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed $\text{SNVRG}^{+}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(ε^{-3}+ε_H^{-5}+ε^{-2}ε_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2017) and Natasha2 (Allen-Zhu, 2017) in certain regimes. Furthermore, we explore the acceleration brought by third-order smoothness of the objective function.