On the Acceleration of L-BFGS with Second-Order Information and Stochastic Batches
This incremental improvement addresses optimization efficiency for machine learning practitioners dealing with large-scale problems like least-square and cross-entropy losses.
The paper tackled the instability of L-BFGS with stochastic batches in finite-sum minimization by using smooth gradient difference estimates and well-scaled initial Hessians, achieving acceleration as supported by numerical experiments.
This paper proposes a framework of L-BFGS based on the (approximate) second-order information with stochastic batches, as a novel approach to the finite-sum minimization problems. Different from the classical L-BFGS where stochastic batches lead to instability, we use a smooth estimate for the evaluations of the gradient differences while achieving acceleration by well-scaling the initial Hessians. We provide theoretical analyses for both convex and nonconvex cases. In addition, we demonstrate that within the popular applications of least-square and cross-entropy losses, the algorithm admits a simple implementation in the distributed environment. Numerical experiments support the efficiency of our algorithms.