LGDSMLSep 4, 2020

On Communication Compression for Distributed Optimization on Heterogeneous Data

arXiv:2009.02388v231 citations
Originality Synthesis-oriented
AI Analysis

This addresses communication bottlenecks in distributed machine learning training for scenarios with non-iid data, but it is incremental as it builds on existing compression methods.

The paper analyzed the impact of heterogeneous data on distributed optimization with gradient compression, finding that D-EF-SGD is less affected than D-QSGD but both slow down with high data skewness, and identified alternatives like a method for strongly convex problems and a general approach for linear compressors.

Lossy gradient compression, with either unbiased or biased compressors, has become a key tool to avoid the communication bottleneck in centrally coordinated distributed training of machine learning models. We analyze the performance of two standard and general types of methods: (i) distributed quantized SGD (D-QSGD) with arbitrary unbiased quantizers and (ii) distributed SGD with error-feedback and biased compressors (D-EF-SGD) in the heterogeneous (non-iid) data setting. Our results indicate that D-EF-SGD is much less affected than D-QSGD by non-iid data, but both methods can suffer a slowdown if data-skewness is high. We further study two alternatives that are not (or much less) affected by heterogenous data distributions: first, a recently proposed method that is effective on strongly convex problems, and secondly, we point out a more general approach that is applicable to linear compressors only but effective in all considered scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes