LG DCMar 26, 2024

Not All Federated Learning Algorithms Are Created Equal: A Performance Evaluation Study

Gustav A. Baumgart, Jaemin Shin, Ali Payani, Myungjin Lee, Ramana Rao Kompella

arXiv:2403.17287v113.418 citationsh-index: 33Has Code

Originality Synthesis-oriented

AI Analysis

This study addresses the need for comprehensive evaluation beyond accuracy in federated learning, providing empirical insights for researchers and practitioners to improve algorithm selection and benchmarking practices.

The paper conducted a performance evaluation of several federated learning algorithms, revealing that no single algorithm excels across all metrics, with trade-offs between accuracy, computational overhead, and stability.

Federated Learning (FL) emerged as a practical approach to training a model from decentralized data. The proliferation of FL led to the development of numerous FL algorithms and mechanisms. Many prior efforts have given their primary focus on accuracy of those approaches, but there exists little understanding of other aspects such as computational overheads, performance and training stability, etc. To bridge this gap, we conduct extensive performance evaluation on several canonical FL algorithms (FedAvg, FedProx, FedYogi, FedAdam, SCAFFOLD, and FedDyn) by leveraging an open-source federated learning framework called Flame. Our comprehensive measurement study reveals that no single algorithm works best across different performance metrics. A few key observations are: (1) While some state-of-the-art algorithms achieve higher accuracy than others, they incur either higher computation overheads (FedDyn) or communication overheads (SCAFFOLD). (2) Recent algorithms present smaller standard deviation in accuracy across clients than FedAvg, indicating that the advanced algorithms' performances are stable. (3) However, algorithms such as FedDyn and SCAFFOLD are more prone to catastrophic failures without the support of additional techniques such as gradient clipping. We hope that our empirical study can help the community to build best practices in evaluating FL algorithms.

View on arXiv PDF

Similar