Addressing Heterogeneity in Federated Learning: Challenges and Solutions for a Shared Production Environment
It addresses data heterogeneity issues in federated learning for manufacturing, but is incremental as it synthesizes existing research and proposes new strategies without introducing a novel paradigm.
This paper tackles the problem of data heterogeneity in federated learning for manufacturing environments, reviewing challenges like non-IID data and proposing strategies such as personalized models and robust aggregation to enhance model robustness and training efficiency.
Federated learning (FL) has emerged as a promising approach to training machine learning models across decentralized data sources while preserving data privacy, particularly in manufacturing and shared production environments. However, the presence of data heterogeneity variations in data distribution, quality, and volume across different or clients and production sites, poses significant challenges to the effectiveness and efficiency of FL. This paper provides a comprehensive overview of heterogeneity in FL within the context of manufacturing, detailing the types and sources of heterogeneity, including non-independent and identically distributed (non-IID) data, unbalanced data, variable data quality, and statistical heterogeneity. We discuss the impact of these types of heterogeneity on model training and review current methodologies for mitigating their adverse effects. These methodologies include personalized and customized models, robust aggregation techniques, and client selection techniques. By synthesizing existing research and proposing new strategies, this paper aims to provide insight for effectively managing data heterogeneity in FL, enhancing model robustness, and ensuring fair and efficient training across diverse environments. Future research directions are also identified, highlighting the need for adaptive and scalable solutions to further improve the FL paradigm in the context of Industry 4.0.