LG DC NA OC MLSep 10, 2019

Tighter Theory for Local SGD on Identical and Heterogeneous Data

Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik

arXiv:1909.04746v435.7486 citations

Originality Incremental advance

AI Analysis

This work offers incremental theoretical improvements for distributed optimization algorithms, particularly relevant for federated learning and parallel computing applications.

The paper provides a tighter theoretical analysis of Local SGD, improving existing convergence bounds for both identical and heterogeneous data regimes by introducing a new variance notion specific to local SGD with different data, and determining optimal stepsize and local iteration values.

We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.

View on arXiv PDF

Similar