Federated Stochastic Gradient Descent Begets Self-Induced Momentum
This work provides insights for systems designers by linking staleness analysis to federated computing systems, but it is incremental as it builds on existing Federated SGD methods.
The paper tackled the problem of analyzing convergence in federated learning by showing that running stochastic gradient descent in this setting introduces a momentum-like term, and they analyzed convergence rates considering parameter staleness and communication resources.
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.