LGDCMLMay 13, 2020

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

arXiv:2005.07041v367 citations
AI Analysis

This addresses communication bottlenecks in decentralized optimization for distributed ML systems, offering incremental improvements over existing compressed SGD methods.

The paper tackles the problem of communication inefficiency in decentralized training of large-scale ML models by proposing SQuARM-SGD, which uses local SGD steps with momentum and compressed updates, achieving convergence rates matching vanilla SGD and showing improved test performance over state-of-the-art methods without momentum.

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion. We provide convergence guarantees of our algorithm for general (non-convex) and convex smooth objectives, which, to the best of our knowledge, is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that the convergence rate of SQuARM-SGD matches that of vanilla SGD. We empirically show that including momentum updates in SQuARM-SGD can lead to better test performance than the current state-of-the-art which does not consider momentum updates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes