LGDCMLOct 16, 2019

A Double Residual Compression Algorithm for Efficient Distributed Learning

arXiv:1910.07561v155 citations
Originality Highly original
AI Analysis

This addresses the communication cost problem in distributed learning for large-scale models, offering a significant efficiency improvement.

The paper tackles the communication bottleneck in distributed machine learning by proposing DORE, a double residual compression algorithm that reduces overall communication by over 95% while maintaining similar model accuracy and convergence speed compared to state-of-the-art baselines.

Large-scale machine learning models are often trained by parallel stochastic gradient descent algorithms. However, the communication cost of gradient aggregation and model synchronization between the master and worker nodes becomes the major obstacle for efficient learning as the number of workers and the dimension of the model increase. In this paper, we propose DORE, a DOuble REsidual compression stochastic gradient descent algorithm, to reduce over $95\%$ of the overall communication such that the obstacle can be immensely mitigated. Our theoretical analyses demonstrate that the proposed strategy has superior convergence properties for both strongly convex and nonconvex objective functions. The experimental results validate that DORE achieves the best communication efficiency while maintaining similar model accuracy and convergence speed in comparison with start-of-the-art baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes