LG AI DCFeb 3, 2022

Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT

Zonghang Li, Yihong He, Hongfang Yu, Jiawen Kang, Xiaoping Li, Zenglin Xu, Dusit Niyato

arXiv:2202.01512v115.6137 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses data heterogeneity and security issues in industrial IoT for improved federated learning efficiency, though it is incremental as it builds on existing FL methods with specific optimizations.

The paper tackles performance degradation in federated learning due to non-i.i.d. data in industrial IoT by proposing FedGS, a framework that selects homogeneous device groups and uses a synchronization protocol, resulting in a 3.5% accuracy improvement and 59% reduction in training rounds.

Nowadays, the industrial Internet of Things (IIoT) has played an integral role in Industry 4.0 and produced massive amounts of data for industrial intelligence. These data locate on decentralized devices in modern factories. To protect the confidentiality of industrial data, federated learning (FL) was introduced to collaboratively train shared machine learning models. However, the local data collected by different devices skew in class distribution and degrade industrial FL performance. This challenge has been widely studied at the mobile edge, but they ignored the rapidly changing streaming data and clustering nature of factory devices, and more seriously, they may threaten data security. In this paper, we propose FedGS, which is a hierarchical cloud-edge-end FL framework for 5G empowered industries, to improve industrial FL performance on non-i.i.d. data. Taking advantage of naturally clustered factory devices, FedGS uses a gradient-based binary permutation algorithm (GBP-CS) to select a subset of devices within each factory and build homogeneous super nodes participating in FL training. Then, we propose a compound-step synchronization protocol to coordinate the training process within and among these super nodes, which shows great robustness against data heterogeneity. The proposed methods are time-efficient and can adapt to dynamic environments, without exposing confidential industrial data in risky manipulation. We prove that FedGS has better convergence performance than FedAvg and give a relaxed condition under which FedGS is more communication-efficient. Extensive experiments show that FedGS improves accuracy by 3.5% and reduces training rounds by 59% on average, confirming its superior effectiveness and efficiency on non-i.i.d. data.

View on arXiv PDF Code

Similar