LGDCJun 3, 2024

Asynchronous Multi-Server Federated Learning for Geo-Distributed Clients

arXiv:2406.01439v23 citations
AI Analysis

This addresses efficiency and scalability issues for federated learning applications with geographically distributed clients, representing an incremental improvement over existing methods.

The paper tackles the scalability limitations of federated learning systems, such as server idle time and single-server bottlenecks, by proposing an asynchronous multi-server architecture that achieves similar or higher accuracy with 61% less training time in geo-distributed settings.

Federated learning (FL) systems enable multiple clients to train a machine learning model iteratively through synchronously exchanging the intermediate model weights with a single server. The scalability of such FL systems can be limited by two factors: server idle time due to synchronous communication and the risk of a single server becoming the bottleneck. In this paper, we propose a new FL architecture, to our knowledge, the first multi-server FL system that is entirely asynchronous, and therefore addresses these two limitations simultaneously. Our solution keeps both servers and clients continuously active. As in previous multi-server methods, clients interact solely with their nearest server, ensuring efficient update integration into the model. Differently, however, servers also periodically update each other asynchronously, and never postpone interactions with clients. We compare our solution to three representative baselines - FedAvg, FedAsync and HierFAVG - on the MNIST and CIFAR-10 image classification datasets and on the WikiText-2 language modeling dataset. Our solution converges to similar or higher accuracy levels than previous baselines and requires 61% less time to do so in geo-distributed settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes