LGMLJun 21, 2019

Adaptive Learning Rate Clipping Stabilizes Learning

arXiv:1906.09060v242 citationsHas Code
AI Analysis

This addresses training instability issues for researchers and practitioners using small batch sizes or high-order loss functions, but it is incremental as it complements existing methods without fundamentally changing them.

The paper tackles the problem of training instability in neural networks caused by 'bad batches' by introducing adaptive learning rate clipping (ALRC), which limits backpropagated losses based on standard deviations above running means, resulting in decreased errors for unstable training scenarios like mean quartic error on CIFAR-10 and mean squared error for micrograph completion.

Artificial neural network training with stochastic gradient descent can be destabilized by "bad batches" with high losses. This is often problematic for training with small batch sizes, high order loss functions or unstably high learning rates. To stabilize learning, we have developed adaptive learning rate clipping (ALRC) to limit backpropagated losses to a number of standard deviations above their running means. ALRC is designed to complement existing learning algorithms: Our algorithm is computationally inexpensive, can be applied to any loss function or batch size, is robust to hyperparameter choices and does not affect backpropagated gradient distributions. Experiments with CIFAR-10 supersampling show that ALCR decreases errors for unstable mean quartic error training while stable mean squared error training is unaffected. We also show that ALRC decreases unstable mean squared errors for partial scanning transmission electron micrograph completion. Our source code is publicly available at https://github.com/Jeffrey-Ede/ALRC

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes