LGDCITSep 26, 2021

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

arXiv:2109.12497v2Has Code
Originality Incremental advance
AI Analysis

This work addresses the scalability issue in distributed optimization for large-scale ML training, offering incremental improvements over existing compression methods.

The paper tackles the communication bottleneck in distributed machine learning by introducing all-reduce compatible gradient compression schemes, which reduce communication overhead while maintaining performance comparable to vanilla SGD, as demonstrated on the CIFAR10 dataset.

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distributed systems does not scale linearly with the number of workers due to the high network communication cost for synchronizing gradients and parameters. Researchers have proposed techniques such as quantization and sparsification to alleviate this problem by compressing the gradients. Most of the compression schemes result in compressed gradients that cannot be directly aggregated with efficient protocols such as all-reduce. In this paper, we present a set of all-reduce compatible gradient compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla SGD. We present the results of our experiments with the CIFAR10 dataset and observations derived during the process. Our compression methods perform better than the in-built methods currently offered by the deep learning frameworks. Code is available at the repository: \url{https://github.com/vineeths96/Gradient-Compression}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes