LGJun 7, 2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

arXiv:2106.03947v411 citations
Originality Highly original
AI Analysis

This work addresses the time efficiency problem for practitioners using NGD in deep learning, offering a novel method that improves upon approximate approaches like KFAC, though it is incremental in advancing NGD techniques.

The paper tackles the high computational cost of inverting the Fisher information matrix in Natural Gradient Descent (NGD) by proposing TENGraD, which uses exact Fisher-block inversion with efficient factorization to preserve curvature information, achieving linear convergence and outperforming state-of-the-art NGD methods and often stochastic gradient descent in wall-clock time on image classification tasks like CIFAR-10, CIFAR-100, and Fashion-MNIST.

This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with approximation. However, the approximations do not reduce the overall time significantly and lead to less accurate parameter updates and loss of curvature information. TENGraD improves the time efficiency of NGD by computing Fisher block inverses with a computationally efficient covariance factorization and reuse method. It computes the inverse of each block exactly using the Woodbury matrix identity to preserve curvature information while admitting (linear) fast convergence rates. Our experiments on image classification tasks for state-of-the-art deep neural architecture on CIFAR-10, CIFAR-100, and Fashion-MNIST show that TENGraD significantly outperforms state-of-the-art NGD methods and often stochastic gradient descent in wall-clock time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes