LG AISep 21, 2021

A Novel Structured Natural Gradient Descent for Deep Learning

arXiv:2109.10100v13.11 citations

Originality Incremental advance

AI Analysis

This addresses optimization bottlenecks for deep learning practitioners working with large networks, though it appears to be an incremental improvement over existing natural gradient methods.

The paper tackles the computational difficulty of computing Fisher information matrices in natural gradient descent for large deep neural networks by proposing a network reconstruction method that achieves natural gradient optimization effects using traditional gradient descent. Experimental results show this method accelerates convergence and achieves better performance than gradient descent while maintaining computational simplicity.

Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This paper proposes a new optimization method whose main idea is to accurately replace the natural gradient optimization by reconstructing the network. More specifically, we reconstruct the structure of the deep neural network, and optimize the new network using traditional gradient descent (GD). The reconstructed network achieves the effect of the optimization way with natural gradient descent. Experimental results show that our optimization method can accelerate the convergence of deep network models and achieve better performance than GD while sharing its computational simplicity.

View on arXiv PDF

Similar