LGJun 9, 2022

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

arXiv:2206.04723v146 citationsh-index: 45
Originality Synthesis-oriented
AI Analysis

This addresses a theoretical gap in federated learning for researchers and practitioners, clarifying the empirical success of FedAvg, though it is incremental as it refines existing theory rather than introducing a new method.

The paper explains why Federated Averaging (FedAvg) works well in practice despite theoretical predictions that data heterogeneity would degrade its performance, by showing that the key assumption of bounded gradient dissimilarity is too pessimistic and proposing a new measure, average drift at optimum, which is nearly zero in real-world tasks, leading to identical convergence rates in homogeneous and heterogeneous settings.

Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions. We find that the key assumption of bounded gradient dissimilarity in previous theoretical analyses is too pessimistic to characterize data heterogeneity in practical applications. For a simple quadratic problem, we demonstrate there exist regimes where large gradient dissimilarity does not have any negative impact on the convergence of FedAvg. Motivated by this observation, we propose a new quantity, average drift at optimum, to measure the effects of data heterogeneity, and explicitly use it to present a new theoretical analysis of FedAvg. We show that the average drift at optimum is nearly zero across many real-world federated training tasks, whereas the gradient dissimilarity can be large. And our new analysis suggests FedAvg can have identical convergence rates in homogeneous and heterogeneous data settings, and hence, leads to better understanding of its empirical success.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes