Revisiting Weighted Aggregation in Federated Learning with Neural Networks
This work addresses the challenge of improving generalization in federated learning for distributed systems, though it is incremental as it builds on existing aggregation methods.
The paper tackles the problem of weighted aggregation in federated learning by revealing that allowing the sum of weights to be less than one acts like weight decay to improve generalization, and it proposes FedLAW, a method that enhances generalization by a large margin across datasets and models.
In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients' data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients' importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.