DCLGMLMar 19, 2018

D$^2$: Decentralized Training over Decentralized Data

arXiv:1803.07068v2392 citations
Originality Highly original
AI Analysis

This addresses a key bottleneck in decentralized machine learning for scenarios where workers have unique data, offering a more robust solution for distributed training.

The paper tackles the problem of decentralized training when data across workers is highly varied, proposing D^2, a decentralized parallel stochastic gradient descent algorithm that reduces sensitivity to data variance, improving convergence rates from O(σ/√(nT) + (nζ^2)^(1/3)/T^(2/3)) to O(σ/√(nT)) and showing significant empirical outperformance over D-PSGD in image classification tasks.

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be {\em unique} and {\em different}. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are {\em not too different}. In this paper, we ask the question: {\em Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?} In this paper, we present D$^2$, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, "decentralized" data). The core of D$^2$ is a variance blackuction extension of the standard D-PSGD algorithm, which improves the convergence rate from $O\left({σ\over \sqrt{nT}} + {(nζ^2)^{\frac{1}{3}} \over T^{2/3}}\right)$ to $O\left({σ\over \sqrt{nT}}\right)$ where $ζ^{2}$ denotes the variance among data on different workers. As a result, D$^2$ is robust to data variance among workers. We empirically evaluated D$^2$ on image classification tasks where each worker has access to only the data of a limited set of labels, and find that D$^2$ significantly outperforms D-PSGD.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes