Momentum with Variance Reduction for Nonconvex Composition Optimization
This work addresses convergence acceleration for nonconvex composition optimization, which is widely applied in machine learning, though it appears incremental as it builds on existing momentum and variance reduction techniques.
The paper tackles the problem of nonconvex composition optimization in machine learning by developing momentum schemes with SPIDER-based variance reduction, achieving near-optimal sample complexity and linear convergence under gradient dominance conditions, with numerical experiments showing significantly faster convergence than existing algorithms.
Composition optimization is widely-applied in nonconvex machine learning. Various advanced stochastic algorithms that adopt momentum and variance reduction techniques have been developed for composition optimization. However, these algorithms do not fully exploit both techniques to accelerate the convergence and are lack of convergence guarantee in nonconvex optimization. This paper complements the existing literature by developing various momentum schemes with SPIDER-based variance reduction for non-convex composition optimization. In particular, our momentum design requires less number of proximal mapping evaluations per-iteration than that required by the existing Katyusha momentum. Furthermore, our algorithm achieves near-optimal sample complexity results in both non-convex finite-sum and online composition optimization and achieves a linear convergence rate under the gradient dominant condition. Numerical experiments demonstrate that our algorithm converges significantly faster than existing algorithms in nonconvex composition optimization.