LG SYJan 5

Distributed Federated Learning by Alternating Periods of Training

Shamik Bhattacharyya, Rachel Kalpana Kalaimani

arXiv:2601.01793v11.4h-index: 7

Originality Incremental advance

AI Analysis

This addresses scalability and fault-tolerance problems for federated learning systems with many clients, though it appears incremental.

The paper tackles the scalability and fault-tolerance issues in federated learning by proposing a distributed approach with multiple servers and inter-server communication, showing that servers converge to a common model within a small tolerance of the ideal model.

Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, the dependence on a single central server is challenging in the case of a large number of clients and even poses the risk of a single point of failure. To address these critical limitations of scalability and fault-tolerance, we present a distributed approach to federated learning comprising multiple servers with inter-server communication capabilities. While providing a fully decentralized approach, the designed framework retains the core federated learning structure where each server is associated with a disjoint set of clients with server-client communication capabilities. We propose a novel DFL (Distributed Federated Learning) algorithm which uses alternating periods of local training on the client data followed by global training among servers. We show that the DFL algorithm, under a suitable choice of parameters, ensures that all the servers converge to a common model value within a small tolerance of the ideal model, thus exhibiting effective integration of local and global training models. Finally, we illustrate our theoretical claims through numerical simulations.

View on arXiv PDF

Similar