Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous Methods

arXiv:2602.03802v11 citations
Originality Incremental advance
AI Analysis

This work addresses the efficiency of distributed optimization for machine learning practitioners, indicating that synchronous methods can be sufficient in many modern heterogeneous settings, though it is incremental as it builds on existing synchronous approaches.

The paper revisits Synchronous SGD and its variant m-Synchronous SGD, showing theoretically that they achieve near-optimal time complexities in many heterogeneous computation scenarios, up to logarithmic factors, despite the prevalence of asynchronous methods.

Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization. We revisit Synchronous SGD and its robust variant, called $m$-Synchronous SGD, and theoretically show that they are nearly optimal in many heterogeneous computation scenarios, which is somewhat unexpected. We analyze the synchronous methods under random computation times and adversarial partial participation of workers, and prove that their time complexities are optimal in many practical regimes, up to logarithmic factors. While synchronous methods are not universal solutions and there exist tasks where asynchronous methods may be necessary, we show that they are sufficient for many modern heterogeneous computation scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes