LG DCMay 3, 2024

Holistic Evaluation Metrics: Use Case Sensitive Evaluation Metrics for Federated Learning

Yanli Li, Jehad Ibrahim, Huaming Chen, Dong Yuan, Kim-Kwang Raymond Choo

arXiv:2405.02360v16.43 citationsh-index: 4Tsinghua Science and Technology

Originality Incremental advance

AI Analysis

This work addresses the need for comprehensive evaluation in federated learning, though it is incremental as it builds on existing metrics by adding use-case sensitivity.

The paper tackles the problem of evaluating federated learning algorithms with a single metric by introducing Holistic Evaluation Metrics (HEM), which integrates multiple aspects like accuracy and fairness with use-case-specific importance vectors, and demonstrates that HEM effectively identifies suitable algorithms for scenarios like IoT and smart devices.

A large number of federated learning (FL) algorithms have been proposed for different applications and from varying perspectives. However, the evaluation of such approaches often relies on a single metric (e.g., accuracy). Such a practice fails to account for the unique demands and diverse requirements of different use cases. Thus, how to comprehensively evaluate an FL algorithm and determine the most suitable candidate for a designated use case remains an open question. To mitigate this research gap, we introduce the Holistic Evaluation Metrics (HEM) for FL in this work. Specifically, we collectively focus on three primary use cases, which are Internet of Things (IoT), smart devices, and institutions. The evaluation metric encompasses various aspects including accuracy, convergence, computational efficiency, fairness, and personalization. We then assign a respective importance vector for each use case, reflecting their distinct performance requirements and priorities. The HEM index is finally generated by integrating these metric components with their respective importance vectors. Through evaluating different FL algorithms in these three prevalent use cases, our experimental results demonstrate that HEM can effectively assess and identify the FL algorithms best suited to particular scenarios. We anticipate this work sheds light on the evaluation process for pragmatic FL algorithms in real-world applications.

View on arXiv PDF

Similar