LGOCJan 30, 2022

Faster Convergence of Local SGD for Over-Parameterized Models

arXiv:2201.12719v311 citations
Originality Incremental advance
AI Analysis

This work addresses convergence issues in federated learning for over-parameterized models, offering incremental improvements in theoretical analysis.

The paper tackles the convergence of Local SGD (FedAvg) for over-parameterized models in heterogeneous data settings, establishing improved error bounds of Ø(1/T) for convex losses under mild assumptions and Ø(K/T) otherwise, which outperform previous bounds of Ø(1/√(nT)).

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of $Ø(1/T)$ under a mild data similarity assumption and an error bound of $Ø(K/T)$ otherwise, where $K$ is the number of local steps and $T$ is the total number of iterations. For non-convex loss functions we prove an error bound of $Ø(K/T)$. These bounds improve upon the best previous bound of $Ø(1/\sqrt{nT})$ in both cases, where $n$ is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the $Ø(1/T)$ convergence rate of Local SGD is clearly shown.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes