OCAILGMLJun 14, 2021

NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

arXiv:2106.07454v14 citations
Originality Incremental advance
AI Analysis

This addresses the problem of slow convergence in deep learning optimization for researchers and practitioners, offering an incremental improvement over existing natural gradient methods.

The paper tackles the computational inefficiency of second-order optimization methods in deep learning by proposing NG+, a multi-step matrix-product natural gradient method that uses a generalized Fisher information matrix to maintain low computational cost comparable to first-order methods. Numerical results show NG+ is competitive with state-of-the-art methods across tasks like image classification and neural machine translation.

In this paper, a novel second-order method called NG+ is proposed. By following the rule ``the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes