Convert, compress, correct: Three steps toward communication-efficient DNN training
This addresses communication bottlenecks in distributed DNN training, offering a domain-specific solution that appears incremental in combining existing techniques.
The paper tackles communication inefficiency in distributed deep neural network training by introducing the CO3 algorithm, which combines quantization, compression, and error correction, achieving improved performance as demonstrated through numerical evaluations on CIFAR-10.
In this paper, we introduce a novel algorithm, $\mathsf{CO}_3$, for communication-efficiency distributed Deep Neural Network (DNN) training. $\mathsf{CO}_3$ is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.