LGNAJan 16, 2013

Revisiting Natural Gradient for Deep Networks

arXiv:1301.3584v7436 citations
Originality Synthesis-oriented
AI Analysis

This work provides incremental improvements to optimization methods for deep learning practitioners.

This paper revisits the natural gradient algorithm for training deep networks, showing connections to three other optimization methods and extending it to incorporate second-order information with a truncated Newton approach.

We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods for training deep models: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We describe how one can use unlabeled data to improve the generalization error obtained by natural gradient and empirically evaluate the robustness of the algorithm to the ordering of the training set compared to stochastic gradient descent. Finally we extend natural gradient to incorporate second order information alongside the manifold information and provide a benchmark of the new algorithm using a truncated Newton approach for inverting the metric matrix instead of using a diagonal approximation of it.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes