OCLGMLJan 28, 2019

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

arXiv:1901.09997v551 citations
Originality Incremental advance
AI Analysis

This work addresses optimization efficiency for machine learning practitioners by offering incremental improvements to quasi-Newton methods with better concurrency and data usage.

The paper tackles the problem of stale information in classical quasi-Newton methods for machine learning by proposing sampled variants (LBFGS and LSR1) that use random local sampling instead of sequential past data, resulting in improved performance on classification and neural network tasks.

We present two sampled quasi-Newton methods (sampled LBFGS and sampled LSR1) for solving empirical risk minimization problems that arise in machine learning. Contrary to the classical variants of these methods that sequentially build Hessian or inverse Hessian approximations as the optimization progresses, our proposed methods sample points randomly around the current iterate at every iteration to produce these approximations. As a result, the approximations constructed make use of more reliable (recent and local) information, and do not depend on past iterate information that could be significantly stale. Our proposed algorithms are efficient in terms of accessed data points (epochs) and have enough concurrency to take advantage of parallel/distributed computing environments. We provide convergence guarantees for our proposed methods. Numerical tests on a toy classification problem as well as on popular benchmarking binary classification and neural network training tasks reveal that the methods outperform their classical variants.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes