LGDCMLFeb 21, 2018

3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning

arXiv:1802.07389v180 citations
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks in distributed ML systems, offering a lightweight and effective solution that is incremental but practical for improving training efficiency.

The paper tackles the problem of communication overhead in distributed machine learning by introducing 3LC, a lossy compression scheme that reduces traffic without sacrificing accuracy, achieving up to 107X compression and up to 23X faster training time for ResNet-110 on CIFAR-10.

The performance and efficiency of distributed machine learning (ML) depends significantly on how long it takes for nodes to exchange state changes. Overly-aggressive attempts to reduce communication often sacrifice final model accuracy and necessitate additional ML techniques to compensate for this loss, limiting their generality. Some attempts to reduce communication incur high computation overhead, which makes their performance benefits visible only over slow networks. We present 3LC, a lossy compression scheme for state change traffic that strikes balance between multiple goals: traffic reduction, accuracy, computation overhead, and generality. It combines three new techniques---3-value quantization with sparsity multiplication, quartic encoding, and zero-run encoding---to leverage strengths of quantization and sparsification techniques and avoid their drawbacks. It achieves a data compression ratio of up to 39--107X, almost the same test accuracy of trained models, and high compression speed. Distributed ML frameworks can employ 3LC without modifications to existing ML algorithms. Our experiments show that 3LC reduces wall-clock training time of ResNet-110--based image classifiers for CIFAR-10 on a 10-GPU cluster by up to 16--23X compared to TensorFlow's baseline design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes