Truncated Non-Uniform Quantization for Distributed SGD
This addresses communication efficiency for distributed machine learning systems, representing an incremental improvement over existing quantization methods.
The paper tackles the communication bottleneck in distributed SGD by introducing a two-stage quantization strategy that truncates long-tail noise and applies non-uniform quantization to gradients. Experimental results show the algorithm outperforms existing quantization schemes with better communication efficiency and convergence performance.
To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics. We provide a comprehensive convergence analysis of the quantized distributed SGD, establishing theoretical guarantees for its performance. Furthermore, by minimizing the convergence error, we derive optimal closed-form solutions for the truncation threshold and non-uniform quantization levels under given communication constraints. Both theoretical insights and extensive experimental evaluations demonstrate that our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.