LGAISep 21, 2021

A Novel Structured Natural Gradient Descent for Deep Learning

arXiv:2109.10100v11 citations
Originality Incremental advance
AI Analysis

This addresses optimization bottlenecks for deep learning practitioners working with large networks, though it appears to be an incremental improvement over existing natural gradient methods.

The paper tackles the computational difficulty of computing Fisher information matrices in natural gradient descent for large deep neural networks by proposing a network reconstruction method that achieves natural gradient optimization effects using traditional gradient descent. Experimental results show this method accelerates convergence and achieves better performance than gradient descent while maintaining computational simplicity.

Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This paper proposes a new optimization method whose main idea is to accurately replace the natural gradient optimization by reconstructing the network. More specifically, we reconstruct the structure of the deep neural network, and optimize the new network using traditional gradient descent (GD). The reconstructed network achieves the effect of the optimization way with natural gradient descent. Experimental results show that our optimization method can accelerate the convergence of deep network models and achieve better performance than GD while sharing its computational simplicity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes