Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning
This addresses the inefficiency in federated learning for IoT and other hierarchical distributed systems, offering a scalable solution with reduced communication costs, though it is an incremental improvement over existing FL frameworks.
The paper tackles the problem of federated learning (FL) frameworks being limited to simple two-tier topologies, which do not exploit real-world distributed system hierarchies, by introducing Flight, a framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decoupled control and data planes. Results show Flight scales to 2048 devices, reduces FL makespan, and cuts communication overhead by over 60% compared to the state-of-the-art Flower framework.
Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.