FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission
This addresses communication bottlenecks in cross-silo federated learning, particularly for large models, but is incremental as it builds on existing FL frameworks with RDMA optimizations.
The paper tackles communication overhead in federated learning by proposing FedRDMA, a system that integrates RDMA with chunked transmission and optimizations, achieving up to 3.8x speedup in communication efficiency compared to TCP/IP-based systems.
Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and designs a series of optimization techniques to improve the efficiency and robustness of RDMA-based communication. We implement FedRDMA atop the industrial federated learning framework and evaluate it on a real-world cross-silo FL scenario. The experimental results show that \sys can achieve up to 3.8$\times$ speedup in communication efficiency compared to traditional TCP/IP-based FL systems.