LGDGOCMLAug 30, 2018

A Coordinate-Free Construction of Scalable Natural Gradient

arXiv:1808.10340v111 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of parameterization sensitivity in neural network optimization for researchers and practitioners, but it is incremental as it builds on existing K-FAC methods.

The paper tackles the problem of maintaining invariance properties in natural gradient descent approximations by analyzing the Kronecker-Factored Approximate Curvature (K-FAC) algorithm through a coordinate-free construction, showing that it matches the natural gradient under a specific Riemannian metric and extends to various network types and metrics.

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC applied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes