Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization
This addresses the issue of slow convergence in nonconvex optimization for machine learning practitioners, though it is incremental as it builds on existing SPIDER methods.
The paper tackles the problem of improving practical performance of stochastic variance-reduced algorithms like SPIDER for nonconvex composite optimization by developing novel momentum schemes, achieving near-optimal gradient oracle complexity and demonstrating superior performance in experiments.
Two new stochastic variance-reduced algorithms named SARAH and SPIDER have been recently proposed, and SPIDER has been shown to achieve a near-optimal gradient oracle complexity for nonconvex optimization. However, the theoretical advantage of SPIDER does not lead to substantial improvement of practical performance over SVRG. To address this issue, momentum technique can be a good candidate to improve the performance of SPIDER. However, existing momentum schemes used in variance-reduced algorithms are designed specifically for convex optimization, and are not applicable to nonconvex scenarios. In this paper, we develop novel momentum schemes with flexible coefficient settings to accelerate SPIDER for nonconvex and nonsmooth composite optimization, and show that the resulting algorithms achieve the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition. Furthermore, we generalize our algorithm to online nonconvex and nonsmooth optimization, and establish an oracle complexity result that matches the state-of-the-art. Our extensive experiments demonstrate the superior performance of our proposed algorithm over other stochastic variance-reduced algorithms.