LGDCOCMLJul 2, 2020

Adaptive Braking for Mitigating Gradient Delay

arXiv:2007.01397v25 citations
AI Analysis

This addresses the performance degradation issue for practitioners using asynchronous training methods, though it is an incremental improvement over existing momentum-based optimizers.

The paper tackles the problem of gradient delay in asynchronous neural network training, which reduces model performance, by introducing Adaptive Braking (AB) to stabilize optimization, enabling training with delays of 32 or more steps on datasets like CIFAR-10 and ImageNet-1k with minimal accuracy loss.

Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel. Asynchronous methods remove synchronization overheads and improve hardware utilization at the cost of introducing gradient delay, which impedes optimization and can lead to lower final model performance. We introduce Adaptive Braking (AB), a modification for momentum-based optimizers that mitigates the effects of gradient delay. AB dynamically scales the gradient based on the alignment of the gradient and the velocity. This can dampen oscillations along high curvature directions of the loss surface, stabilizing and accelerating asynchronous training. We show that applying AB on top of SGD with momentum enables training ResNets on CIFAR-10 and ImageNet-1k with delays $D \geq$ 32 update steps with minimal drop in final test accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes