LGDCMLNov 20, 2019

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

arXiv:1911.09030v246 citations
Originality Highly original
AI Analysis

This work addresses the communication efficiency problem in distributed machine learning training, offering a significant improvement for large-scale applications.

The paper tackles the communication bottleneck in distributed training by proposing a novel SGD variant that reduces communication overhead and incorporates adaptive learning rates, achieving up to a 30% reduction in training time on the 1B word dataset.

When scaling distributed training, the communication overhead is often the bottleneck. In this paper, we propose a novel SGD variant with reduced communication and adaptive learning rates. We prove the convergence of the proposed algorithm for smooth but non-convex problems. Empirical results show that the proposed algorithm significantly reduces the communication overhead, which, in turn, reduces the training time by up to 30% for the 1B word dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes