LGMLJun 15, 2020

Optimal Complexity in Decentralized Training

arXiv:2006.08085v495 citations
AI Analysis

This work addresses the problem of scaling parallel machine learning systems for researchers and practitioners by providing theoretical insights and a practical algorithm, though it is incremental as it builds on existing decentralized methods.

The paper establishes a tight lower bound on iteration complexity for decentralized training in stochastic non-convex settings, revealing a gap in existing methods like D-PSGD, and proposes DeTAG, a gossip-style algorithm that achieves this bound with a logarithmic gap, showing faster convergence in image classification tasks, especially on unshuffled data and sparse networks.

Decentralization is a promising method of scaling up parallel machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. Our lower bound reveals a theoretical gap in known convergence rates of many existing decentralized training algorithms, such as D-PSGD. We prove by construction this lower bound is tight and achievable. Motivated by our insights, we further propose DeTAG, a practical gossip-style decentralized algorithm that achieves the lower bound with only a logarithm gap. Empirically, we compare DeTAG with other decentralized algorithms on image classification tasks, and we show DeTAG enjoys faster convergence compared to baselines, especially on unshuffled data and in sparse networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes