DC AI NIDec 6, 2024

NebulaFL: Effective Asynchronous Federated Learning for JointCloud Computing

Fei Gao, Ming Hu, Zhiyu Xie, Peichang Shi, Xiaofei Xie, Guodong Yi, Huaimin Wang

arXiv:2412.04868v12.3h-index: 8

Originality Incremental advance

AI Analysis

This work addresses performance and cost issues in cloud-based federated learning for data owners and service providers, presenting an incremental improvement over existing FL methods.

The paper tackled challenges in Federated Learning as a Service (FLaaS) for JointCloud Computing, including data heterogeneity, high communication overhead, and inefficient resource scheduling, by proposing NebulaFL, which achieved up to 5.71% accuracy improvement, 50% reduction in communication overhead, and 61.94% cost reduction compared to state-of-the-art methods.

With advancements in AI infrastructure and Trusted Execution Environment (TEE) technology, Federated Learning as a Service (FLaaS) through JointCloud Computing (JCC) is promising to break through the resource constraints caused by heterogeneous edge devices in the traditional Federated Learning (FL) paradigm. Specifically, with the protection from TEE, data owners can achieve efficient model training with high-performance AI services in the cloud. By providing additional FL services, cloud service providers can achieve collaborative learning among data owners. However, FLaaS still faces three challenges, i.e., i) low training performance caused by heterogeneous data among data owners, ii) high communication overhead among different clouds (i.e., data centers), and iii) lack of efficient resource scheduling strategies to balance training time and cost. To address these challenges, this paper presents a novel asynchronous FL approach named NebulaFL for collaborative model training among multiple clouds. To address data heterogeneity issues, NebulaFL adopts a version control-based asynchronous FL training scheme in each data center to balance training time among data owners. To reduce communication overhead, NebulaFL adopts a decentralized model rotation mechanism to achieve effective knowledge sharing among data centers. To balance training time and cost, NebulaFL integrates a reward-guided strategy for data owners selection and resource scheduling. The experimental results demonstrate that, compared to the state-of-the-art FL methods, NebulaFL can achieve up to 5.71\% accuracy improvement. In addition, NebulaFL can reduce up to 50% communication overhead and 61.94% costs under a target accuracy.

View on arXiv PDF

Similar