Stochastic Recursive Variance-Reduced Cubic Regularization Methods
This work addresses computational bottlenecks in high-dimensional nonconvex optimization for machine learning and data science applications, offering incremental improvements over existing SVRC methods.
The authors tackled the computational inefficiency of existing Stochastic Variance-Reduced Cubic regularization (SVRC) algorithms for nonconvex finite-sum optimization by proposing a Stochastic Recursive Variance-Reduced Cubic regularization method (SRVRC) with improved gradient and Hessian complexities, and a Hessian-free variant (SRVRC$_{\ ext{free}}$) that achieves a runtime complexity of $\ ilde O(dn\\epsilon^{-2} \\land d\\epsilon^{-3})$, outperforming prior methods like the $\ ilde O(d\\epsilon^{-3.5})$ complexity from Tripuraneni et al. 2018.
Stochastic Variance-Reduced Cubic regularization (SVRC) algorithms have received increasing attention due to its improved gradient/Hessian complexities (i.e., number of queries to stochastic gradient/Hessian oracles) to find local minima for nonconvex finite-sum optimization. However, it is unclear whether existing SVRC algorithms can be further improved. Moreover, the semi-stochastic Hessian estimator adopted in existing SVRC algorithms prevents the use of Hessian-vector product-based fast cubic subproblem solvers, which makes SVRC algorithms computationally intractable for high-dimensional problems. In this paper, we first present a Stochastic Recursive Variance-Reduced Cubic regularization method (SRVRC) using a recursively updated semi-stochastic gradient and Hessian estimators. It enjoys improved gradient and Hessian complexities to find an $(ε, \sqrtε)$-approximate local minimum, and outperforms the state-of-the-art SVRC algorithms. Built upon SRVRC, we further propose a Hessian-free SRVRC algorithm, namely SRVRC$_{\text{free}}$, which only requires stochastic gradient and Hessian-vector product computations, and achieves $\tilde O(dnε^{-2} \land dε^{-3})$ runtime complexity, where $n$ is the number of component functions in the finite-sum structure, $d$ is the problem dimension, and $ε$ is the optimization precision. This outperforms the best-known runtime complexity $\tilde O(dε^{-3.5})$ achieved by stochastic cubic regularization algorithm proposed in Tripuraneni et al. 2018.