LG AISep 26, 2024

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

Xiufang Shi, Wei Zhang, Mincheng Wu, Guangyi Liu, Zhenyu Wen, Shibo He, Tejal Shah, Rajiv Ranjan

arXiv:2409.17517v26.43 citationsh-index: 73

Originality Incremental advance

AI Analysis

This addresses performance issues in federated learning for IoT applications, but it is incremental as it builds on existing methods with a hybrid approach.

The paper tackles the challenges of statistical heterogeneity and high communication overhead in federated learning for IoT by proposing HFLDD, a hybrid framework that uses dataset distillation to generate approximately IID data, resulting in improved test accuracy and reduced communication cost on severely imbalanced datasets.

With the development of edge computing, Federated Learning (FL) has emerged as a promising solution for the intelligent Internet of Things (IoT). However, applying FL in mobile edge-cloud networks is greatly challenged by statistical heterogeneity and high communication overhead. To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training. In particular, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced. The cluster heads collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server. This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of non-IID data on model training. We perform a comprehensive analysis of the convergence behavior, communication overhead, and computational complexity of the proposed HFLDD. Extensive experimental results based on multiple public datasets demonstrate that when data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.

View on arXiv PDF

Similar