Stochastic Newton and Cubic Newton Methods with Simple Local Linear-Quadratic Rates
This work addresses the computational bottleneck in stochastic optimization by providing faster, curvature-adaptive methods for large-scale machine learning problems, though it is incremental relative to existing second-order approaches.
The authors introduced two simple stochastic second-order methods, SN and SCN, for minimizing large sums of smooth, strongly convex functions, achieving local linear-quadratic convergence with only a single gradient and Hessian computation per iteration.
We present two new remarkably simple stochastic second-order methods for minimizing the average of a very large number of sufficiently smooth and strongly convex functions. The first is a stochastic variant of Newton's method (SN), and the second is a stochastic variant of cubically regularized Newton's method (SCN). We establish local linear-quadratic convergence results. Unlike existing stochastic variants of second order methods, which require the evaluation of a large number of gradients and/or Hessians in each iteration to guarantee convergence, our methods do not have this shortcoming. For instance, the simplest variants of our methods in each iteration need to compute the gradient and Hessian of a {\em single} randomly selected function only. In contrast to most existing stochastic Newton and quasi-Newton methods, our approach guarantees local convergence faster than with first-order oracle and adapts to the problem's curvature. Interestingly, our method is not unbiased, so our theory provides new intuition for designing new stochastic methods.