LGMLFeb 4, 2020

Large Batch Training Does Not Need Warmup

arXiv:2002.01576v16 citations
AI Analysis

This addresses the challenge of efficient large-batch training for deep learning practitioners, offering a novel method that bridges theoretical gaps and improves performance.

The paper tackles the problem of slow convergence in large-batch training of deep neural networks by proposing the CLARS algorithm, which outperforms gradual warmup and achieves state-of-the-art convergence on ImageNet with networks like ResNet, DenseNet, and MobileNet.

Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. However, the optimizer converges slowly at early epochs and there is a gap between large-batch deep learning optimization heuristics and theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We also analyze the convergence rate of the proposed method by introducing a new fine-grained analysis of gradient-based methods. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques, including linear learning rate scaling, gradual warmup, and layer-wise adaptive rate scaling. Extensive experiments demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes