Clustered Data Sharing for Non-IID Federated Learning over Wireless Networks
This addresses the challenge of statistical imbalances in federated learning for IoT applications, but it is incremental as it builds on existing FL methods with a novel clustering approach.
The paper tackles the problem of non-IID data in federated learning, which causes high communication costs and accuracy declines, by proposing a clustered data sharing framework that improves convergence and model accuracy in limited communication environments.
Federated Learning (FL) is a novel distributed machine learning approach to leverage data from Internet of Things (IoT) devices while maintaining data privacy. However, the current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. To address the statistical imbalances in FL, we propose a clustered data sharing framework which spares the partial data from cluster heads to credible associates through device-to-device (D2D) communication. Moreover, aiming at diluting the data skew on nodes, we formulate the joint clustering and data sharing problem based on the privacy-preserving constrained graph. To tackle the serious coupling of decisions on the graph, we devise a distribution-based adaptive clustering algorithm (DACA) basing on three deductive cluster-forming conditions, which ensures the maximum yield of data sharing. The experiments show that the proposed framework facilitates FL on non-IID datasets with better convergence and model accuracy under a limited communication environment.