A Proximal Stochastic Quasi-Newton Algorithm
This addresses optimization challenges in large-scale machine learning, but it appears incremental as it builds on existing proximal and stochastic methods.
The paper tackles the problem of minimizing a sum of smooth and non-smooth convex functions, where the smooth part is an average of many components, by proposing a proximal stochastic second-order method that achieves linear convergence.
In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal mapping. We propose a proximal stochastic second-order method, which is efficient and scalable. It incorporates the Hessian in the smooth part of the function and exploits multistage scheme to reduce the variance of the stochastic gradient. We prove that our method can achieve linear rate of convergence.