LG DC MLAug 16, 2019

Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs

Xin Yao, Tianchi Huang, Chenglei Wu, Rui-Xiao Zhang, Lifeng Sun

arXiv:1908.05891v210.741 citations

Originality Incremental advance

AI Analysis

This addresses efficiency and performance issues in federated learning for distributed networks like smartphones and IoT devices, but it is incremental as it builds on existing FedAvg methods.

The paper tackles the problem of high communication costs and performance drops in federated learning, especially with non-IID data, by proposing two methods: FedMMD reduces communication rounds by over 20%, and FedFusion reduces them by over 60% while improving accuracy and generalization.

Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training. The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20% in required communication rounds. The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.

View on arXiv PDF

Similar