MLLGMar 26, 2018

Online Second Order Methods for Non-Convex Stochastic Optimizations

arXiv:1803.09383v34 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in deep learning, offering incremental improvements to existing methods for researchers and practitioners.

The paper tackles the problem of non-convex stochastic optimization by proposing online second-order methods based on preconditioned stochastic gradient descent, improving implementations with new preconditioners and better numerical stability, and demonstrates advantages in generalization performance and convergence speed on benchmark neural network problems.

This paper proposes a family of online second order methods for possibly non-convex stochastic optimizations based on the theory of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity simultaneously. We have improved the implementations of the original PSGD in several ways, e.g., new forms of preconditioners, more accurate Hessian vector product calculations, and better numerical stability with vanishing or ill-conditioned Hessian, etc.. We also have unrevealed the relationship between feature normalization and PSGD with Kronecker product preconditioners, which explains the excellent performance of Kronecker product preconditioners in deep neural network learning. A software package (https://github.com/lixilinx/psgd_tf) implemented in Tensorflow is provided to compare variations of stochastic gradient descent (SGD) and PSGD with five different preconditioners on a wide range of benchmark problems with commonly used neural network architectures, e.g., convolutional and recurrent neural networks. Experimental results clearly demonstrate the advantages of PSGD in terms of generalization performance and convergence speed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes