Convergence Analysis of Federated Learning Methods Using Backward Error Analysis
This provides theoretical insights into federated learning convergence for researchers, though it is incremental as it builds on existing methods with a new analytical perspective.
The paper analyzes the convergence behavior of federated learning algorithms on non-IID data by identifying implicit regularizers through backward error analysis, showing that FedAvg's regularizer increases gradient variance and hampers convergence, while FedSAM and SCAFFOLD partially or fully mitigate biases to improve convergence.
Backward error analysis allows finding a modified loss function, which the parameter updates really follow under the influence of an optimization method. The additional loss terms included in this modified function is called implicit regularizer. In this paper, we attempt to find the implicit regularizer for various federated learning algorithms on non-IID data distribution, and explain why each method shows different convergence behavior. We first show that the implicit regularizer of FedAvg disperses the gradient of each client from the average gradient, thus increasing the gradient variance. We also empirically show that the implicit regularizer hampers its convergence. Similarly, we compute the implicit regularizers of FedSAM and SCAFFOLD, and explain why they converge better. While existing convergence analyses focus on pointing out the advantages of FedSAM and SCAFFOLD, our approach can explain their limitations in complex non-convex settings. In specific, we demonstrate that FedSAM can partially remove the bias in the first-order term of the implicit regularizer in FedAvg, whereas SCAFFOLD can fully eliminate the bias in the first-order term, but not in the second-order term. Consequently, the implicit regularizer can provide a useful insight on the convergence behavior of federated learning from a different theoretical perspective.