Stochastic quasi-Newton with adaptive step lengths for large-scale problems
This addresses optimization efficiency for large-scale machine learning problems, representing an incremental improvement over existing stochastic quasi-Newton methods.
The authors tackled the problem of large-scale stochastic optimization by developing a numerically robust method that exploits local geometry through an auxiliary variable construction and inverse Hessian approximation. The method achieved encouraging performance on real-world benchmark problems with millions of observations and unknowns.
We provide a numerically robust and fast method capable of exploiting the local geometry when solving large-scale stochastic optimisation problems. Our key innovation is an auxiliary variable construction coupled with an inverse Hessian approximation computed using a receding history of iterates and gradients. It is the Markov chain nature of the classic stochastic gradient algorithm that enables this development. The construction offers a mechanism for stochastic line search adapting the step length. We numerically evaluate and compare against current state-of-the-art with encouraging performance on real-world benchmark problems where the number of observations and unknowns is in the order of millions.