LGMLMay 29, 2019

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

arXiv:1905.12558v3278 citations
Originality Incremental advance
AI Analysis

This work addresses a theoretical limitation for researchers and practitioners using approximate second-order optimization methods, highlighting potential pitfalls in widely adopted heuristics.

The paper disputes the use of the empirical Fisher approximation in natural gradient descent by showing it fails to capture second-order information and is unlikely to match the true Fisher or Hessian in practice, leading to undesirable effects even on simple problems.

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an approximation known as the empirical Fisher, drawing connections between approximate second-order methods and heuristics like Adam. We dispute this argument by showing that the empirical Fisher---unlike the Fisher---does not generally capture second-order information. We further argue that the conditions under which the empirical Fisher approaches the Fisher (and the Hessian) are unlikely to be met in practice, and that, even on simple optimization problems, the pathologies of the empirical Fisher can have undesirable effects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes