LGAIAug 18, 2025

Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg

arXiv:2508.12576v12 citationsh-index: 1ICML
Originality Highly original
AI Analysis

This work addresses the challenge of training effective global models across non-IID data distributions in federated learning, providing theoretical insights that could improve scalability and generalization in decentralized settings.

The paper tackles the problem of data heterogeneity in federated learning by analyzing FedAvg with overparameterized networks, proving that increasing network width reduces and eventually eliminates heterogeneity's impact, and showing that in the infinite-width regime, FedAvg matches centralized learning performance with the same gradient descent iterations.

Federated learning (FL) enables decentralized clients to train a model collaboratively without sharing local data. A key distinction between FL and centralized learning is that clients' data are non-independent and identically distributed, which poses significant challenges in training a global model that generalizes well across heterogeneous local data distributions. In this paper, we analyze the convergence of overparameterized FedAvg with gradient descent (GD). We prove that the impact of data heterogeneity diminishes as the width of neural networks increases, ultimately vanishing when the width approaches infinity. In the infinite-width regime, we further prove that both the global and local models in FedAvg behave as linear models, and that FedAvg achieves the same generalization performance as centralized learning with the same number of GD iterations. Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes