LG DCApr 1, 2025

EMO: Edge Model Overlays to Scale Model Size in Federated Learning

Di Wu, Weibo He, Wanglei Feng, Zhenyu Wen, Bin Qian, Blesson Varghese

arXiv:2504.00726v14.1h-index: 27ICDCSW

Originality Incremental advance

AI Analysis

This addresses the problem of scaling model size in federated learning for edge computing, offering a novel solution that is incremental by building on existing FL and SFL approaches.

The paper tackles the challenge of training large models in Federated Learning (FL) due to edge device limitations, proposing EMO with Edge Model Overlays to enable scaling without modifying FL workflow, resulting in up to 17.77% accuracy improvement over FL and up to 7.17x communication cost reduction and 6.9x training time decrease compared to Split Federated Learning.

Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17x and decreases training time by up to 6.9x compared to SFL.

View on arXiv PDF

Similar