LG OCNov 27, 2020

Eigenvalue-corrected Natural Gradient Based on a New Approximation

Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Shuangling Wang, Zidong Wang, Dachuan Xu, Fan Yu

arXiv:2011.13609v16.57 citations

Originality Incremental advance

AI Analysis

This work offers an incremental improvement in optimization methods for deep neural networks, potentially benefiting researchers and practitioners seeking more efficient training algorithms.

This paper proposes TEKFAC, a new second-order optimization method for deep neural networks. It combines ideas from EKFAC and a method by Gao et al. (2020) to correct re-scaling factors and use a new approximation for the Fisher information matrix. TEKFAC empirically outperforms SGD with momentum, Adam, EKFAC, and TKFAC on several DNNs.

Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018), proposes an interpretation of viewing natural gradient update as a diagonal method, and corrects the inaccurate re-scaling factor in the Kronecker-factored eigenbasis. Gao et al. (2020) considers a new approximation to the natural gradient, which approximates the Fisher information matrix (FIM) to a constant multiplied by the Kronecker product of two matrices and keeps the trace equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique proposed in Gao et al. (2020). We also discuss the differences and relationships among the Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.

View on arXiv PDF

Similar