LGMLFeb 26, 2020

Moniqua: Modulo Quantized Communication in Decentralized SGD

arXiv:2002.11787v354 citations
AI Analysis

This work addresses communication bottlenecks in decentralized machine learning, offering a practical solution for distributed training with low bit budgets, though it is incremental as it builds on existing quantized communication methods.

The paper tackles the problem of high communication costs in decentralized SGD by proposing Moniqua, a modulo quantization technique that enables 1-bit-per-parameter communication while maintaining the same asymptotic convergence rate as full-precision methods, achieving this without additional memory and demonstrating faster wall-clock convergence and robustness in training ResNet models on CIFAR10.

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes