CR LGJan 29, 2025

Byzantine-Robust Federated Learning over Ring-All-Reduce Distributed Computing

Minghong Fang, Zhuqing Liu, Xuecen Zhao, Jia Liu

arXiv:2501.17392v213.47 citationsh-index: 5WWW

Originality Highly original

AI Analysis

This addresses security risks in scalable distributed learning for privacy-sensitive applications, representing a foundational advancement in the field.

The paper tackles the problem of Byzantine attacks in federated learning using ring-all-reduce architectures, proposing BRACE as the first algorithm to achieve Byzantine robustness and communication efficiency, with theoretical guarantees and experimental validation.

Federated learning (FL) has gained attention as a distributed learning paradigm for its data privacy benefits and accelerated convergence through parallel computation. Traditional FL relies on a server-client (SC) architecture, where a central server coordinates multiple clients to train a global model, but this approach faces scalability challenges due to server communication bottlenecks. To overcome this, the ring-all-reduce (RAR) architecture has been introduced, eliminating the central server and achieving bandwidth optimality. However, the tightly coupled nature of RAR's ring topology exposes it to unique Byzantine attack risks not present in SC-based FL. Despite its potential, designing Byzantine-robust RAR-based FL algorithms remains an open problem. To address this gap, we propose BRACE (Byzantine-robust ring-all-reduce), the first RAR-based FL algorithm to achieve both Byzantine robustness and communication efficiency. We provide theoretical guarantees for the convergence of BRACE under Byzantine attacks, demonstrate its bandwidth efficiency, and validate its practical effectiveness through experiments. Our work offers a foundational understanding of Byzantine-robust RAR-based FL design.

View on arXiv PDF

Similar