LGDCMLFeb 25, 2020

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

arXiv:2002.11082v17 citations
AI Analysis

This work addresses communication bottlenecks in distributed training for computer vision, offering a solution that reduces costs while maintaining model performance, though it is incremental in the context of existing quantization methods.

The authors tackled the problem of communication overhead in distributed deep learning by deriving an optimal condition for gradient quantization that applies to any gradient distribution, and they developed two novel quantization schemes that demonstrated superior performance on CIFAR and ImageNet datasets with popular CNNs.

The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the ideal linear training speedup regarding the number of devices. Gradient quantization is one of the common methods to reduce communication costs. However, it can lead to quantization error in the training and result in model performance degradation. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for \textbf{ANY} gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively, which dynamically determine the optimal quantization levels. Extensive experimental results on CIFAR and ImageNet datasets with several popular convolutional neural networks show the superiority of our proposed methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes