LGDCNIOCOct 23, 2020

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

arXiv:2010.12229v2110 citations
Originality Highly original
AI Analysis

This addresses inefficiencies in federated learning for organizations with high-speed links, offering practical improvements.

The paper tackles the communication bottleneck in cross-silo federated learning by designing optimal topologies, achieving speedups of up to 9x compared to master-slave and 1.5x over MATCHA in realistic networks.

Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes