Tighter Theory for Local SGD on Identical and Heterogeneous Data
This work offers incremental theoretical improvements for distributed optimization algorithms, particularly relevant for federated learning and parallel computing applications.
The paper provides a tighter theoretical analysis of Local SGD, improving existing convergence bounds for both identical and heterogeneous data regimes by introducing a new variance notion specific to local SGD with different data, and determining optimal stepsize and local iteration values.
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.