LGMLJun 11, 2018

Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

arXiv:1806.03884v2197 citations
Originality Highly original
AI Analysis

This work addresses the problem of inefficient optimization in deep learning for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the computational challenge of using gradient covariance in natural gradient descent for large models by proposing a novel approximation that is provably better than KFAC and allows cheap updates, showing improvements in optimization speed for deep networks.

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes