LGMLMay 25, 2017

Diagonal Rescaling For Neural Networks

arXiv:1705.09319v110 citations
Originality Synthesis-oriented
AI Analysis

This work addresses robustness issues in neural network optimization, offering incremental improvements to training algorithms for machine learning practitioners.

The paper tackled the lack of robustness in a second-order stochastic gradient training algorithm for neural networks by proposing new stepsize scaling methods and emphasizing the importance of handling curvature changes, resulting in clarified connections to existing algorithms like RMSProp and fanin scaling.

We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tricks such as fanin stepsize scaling. The second insight stresses the practical importance of dealing with fast changes of the curvature of the cost.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes