OCMLJun 17, 2020

Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

arXiv:2006.09606v25 citations
AI Analysis

This work addresses optimization efficiency for machine learning practitioners, but it is incremental as it builds on existing quasi-Newton methods with structural improvements.

The paper tackles the problem of efficiently incorporating curvature information in stochastic second-order optimization for nonconvex functions by proposing a structured stochastic quasi-Newton method that uses partial Hessian information and exploits low-rank or Kronecker-product structures, resulting in competitive performance on tasks like logistic regression and deep neural networks.

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes