LGAIJul 30, 2021

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

arXiv:2107.14575v1
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks in distributed learning for large-scale NLP and computer vision tasks, representing an incremental improvement over existing gradient quantization methods.

The paper tackles the problem of communication inefficiency in distributed learning by proposing DQ-SGD, a dynamic quantization framework for SGD that adjusts quantization per gradient step to balance communication cost and convergence error, achieving better trade-offs than state-of-the-art methods on AG-News, CIFAR-10, and CIFAR-100 datasets.

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks on AG-News, CIFAR-10, and CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes