MLJun 12, 2017

Practical Gauss-Newton Optimisation for Deep Learning

Aleksandar Botev, Hippolyt Ritter, David Barber

arXiv:1706.03662v231.4263 citations

Originality Incremental advance

AI Analysis

This work addresses optimization efficiency and hyperparameter tuning issues for deep learning practitioners, offering a practical alternative to first-order methods.

The paper tackles the challenge of efficient second-order optimization in deep learning by introducing a block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks, resulting in competitive performance against state-of-the-art first-order methods with sometimes significant improvements and good performance using default settings.

We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks. Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a labo- rious process, our approach can provide good performance even when used with default set- tings. A side result of our work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.

View on arXiv PDF

Similar