LG MLJun 11, 2018

Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent

arXiv:1806.03884v223.2210 citations

Originality Highly original

AI Analysis

This work addresses the problem of inefficient optimization in deep learning for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the computational challenge of using gradient covariance in natural gradient descent for large models by proposing a novel approximation that is provably better than KFAC and allows cheap updates, showing improvements in optimization speed for deep networks.

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.

View on arXiv PDF

Similar