LGDCMLMay 29, 2023

Quantize Once, Train Fast: Allreduce-Compatible Compression with Provable Guarantees

arXiv:2305.18627v2
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks for large-scale distributed deep learning, offering a practical solution with theoretical guarantees, though it is incremental as it builds on existing quantization frameworks.

The paper tackled the problem of high communication overhead in distributed deep learning by introducing Global-QSGD, an Allreduce-compatible gradient quantization method that accelerates training by up to 3.51% over baseline methods while preserving accuracy.

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this bottleneck. However, existing quantization schemes are often incompatible with Allreduce, the dominant communication primitive in distributed deep learning, and many prior solutions rely on heuristics without theoretical guarantees. We introduce Global-QSGD, an Allreduce-compatible gradient quantization method that leverages global norm scaling to reduce communication overhead while preserving accuracy. Global-QSGD is backed by rigorous theoretical analysis, extending standard unbiased compressor frameworks to establish formal convergence guarantees. Additionally, we develop a performance model to evaluate its impact across different hardware configurations. Extensive experiments on NVLink, PCIe, and large-scale cloud environments show that Global-QSGD accelerates distributed training by up to 3.51% over baseline quantization methods, making it a practical and efficient solution for large-scale deep learning workloads.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes