LGMLSep 28, 2020

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

arXiv:2009.13586v635 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses optimization challenges in machine learning, particularly for nonconvex problems, by offering an efficient quasi-Newton approach, though it appears incremental as it builds on existing adaptive methods.

The paper tackles nonconvex stochastic optimization by introducing Apollo, a quasi-Newton method that approximates the Hessian with a diagonal matrix for efficiency, and it shows significant improvements in convergence speed and generalization over methods like SGD and Adam in vision and language tasks.

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Experiments on three tasks of vision and language show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes