LG NIFeb 2

Conflict-Aware Client Selection for Multi-Server Federated Learning

Mingwei Hong, Zheng Lin, Zehang Lin, Lin Li, Miao Yang, Xia Du, Zihan Fang, Zhaolu Kang, Dianxin Luan, Shunzhi Zhu

arXiv:2602.02458v17.56 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work addresses bandwidth conflicts and training latency in multi-server federated learning systems, which is an incremental improvement for distributed machine learning applications.

The paper tackles the problem of resource contention and training failures in multi-server federated learning due to overlapping client coverage and uncoordinated selection, proposing a decentralized reinforcement learning approach that reduces inter-server conflicts and improves training efficiency, with experiments showing significant gains in convergence speed and communication cost.

Federated learning (FL) has emerged as a promising distributed machine learning (ML) that enables collaborative model training across clients without exposing raw data, thereby preserving user privacy and reducing communication costs. Despite these benefits, traditional single-server FL suffers from high communication latency due to the aggregation of models from a large number of clients. While multi-server FL distributes workloads across edge servers, overlapping client coverage and uncoordinated selection often lead to resource contention, causing bandwidth conflicts and training failures. To address these limitations, we propose a decentralized reinforcement learning with conflict risk prediction, named RL CRP, to optimize client selection in multi-server FL systems. Specifically, each server estimates the likelihood of client selection conflicts using a categorical hidden Markov model based on its sparse historical client selection sequence. Then, a fairness-aware reward mechanism is incorporated to promote long-term client participation for minimizing training latency and resource contention. Extensive experiments demonstrate that the proposed RL-CRP framework effectively reduces inter-server conflicts and significantly improves training efficiency in terms of convergence speed and communication cost.

View on arXiv PDF

Similar