SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization
This work addresses communication efficiency for decentralized optimization in machine learning, offering a novel method that is incremental in combining event-triggering and compression techniques.
The paper tackles the problem of high communication costs in decentralized training of large-scale machine learning models by proposing SPARQ-SGD, an event-triggered and compressed algorithm that reduces communication without affecting convergence rates, achieving significant savings over state-of-the-art methods on real datasets.
In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$ and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex settings, respectively, demonstrating that such aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate as compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for "free". We evaluate SPARQ-SGD over real datasets to demonstrate significant amount of savings in communication over the state-of-the-art.