Christian Zirpins

h-index13

3papers

4citations

Novelty33%

AI Score38

Ranked #84,443 of 194,257 authors (top 43%)#395 in DC (top 41%)

3 Papers

7.6DCMar 26Code

Revealing the influence of participant failures on model quality in cross-silo Federated Learning

Fabian Stricker, David Bermbach, Christian Zirpins

Federated Learning (FL) is a paradigm for training machine learning (ML) models in collaborative settings while preserving participants' privacy by keeping raw data local. A key requirement for the use of FL in production is reliability, as insufficient reliability can compromise the validity, stability, and reproducibility of learning outcomes. FL inherently operates as a distributed system and is therefore susceptible to crash failures, network partitioning, and other fault scenarios. Despite this, the impact of such failures on FL outcomes has not yet been studied systematically. In this paper, we address this gap by investigating the impact of missing participants in FL. To this end, we conduct extensive experiments on image, tabular, and time-series data and analyze how the absence of participants affects model performance, taking into account influencing factors such as data skewness, different availability patterns, and model architectures. Furthermore, we examine scenario-specific aspects, including the utility of the global model for missing participants. Our experiments provide detailed insights into the effects of various influencing factors. In particular, we show that data skewness has a strong impact, often leading to overly optimistic model evaluations and, in some cases, even altering the effects of other influencing factors.

5.3LGMay 8

FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning

Fabian Stricker, Jose A. Peregrina, David Bermbach et al.

Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently, the coordinator must rely on locally computed evaluation metrics and aggregate them to assess the global model. A key challenge is that common aggregation strategies, such as weighted averaging based on the local samples per participant, do not always produce the same results as centralized evaluation. Existing definitions of performance evaluation are largely tailored to accuracy and do not generalize to other metrics, leading to inconsistencies between participant-based and centralized evaluation. However, such discrepancies are inconsistent with the FL objective and lead to a wrong calculation of the metric. To address this issue, we examine the underlying reasons for these discrepancies and propose FLAM, a performance evaluation method based on aggregatable measures that yields the same results as centralized evaluation without the need for a global test dataset.

5.9DCJan 31, 2025

FL-APU: A Software Architecture to Ease Practical Implementation of Cross-Silo Federated Learning

F. Stricker, J. A. Peregrina, D. Bermbach et al.

Federated Learning (FL) is an upcoming technology that is increasingly applied in real-world applications. Early applications focused on cross-device scenarios, where many participants with limited resources train machine learning (ML) models together, e.g., in the case of Google's GBoard. Contrarily, cross-silo scenarios have only few participants but with many resources, e.g., in the healthcare domain. Despite such early efforts, FL is still rarely used in practice and best practices are, hence, missing. For new applications, in our case inter-organizational cross-silo applications, overcoming this lack of role models is a significant challenge. In order to ease the use of FL in real-world cross-silo applications, we here propose a scenario-based architecture for the practical use of FL in the context of multiple companies collaborating to improve the quality of their ML models. The architecture emphasizes the collaboration between the participants and the FL server and extends basic interactions with domain-specific features. First, it combines governance with authentication, creating an environment where only trusted participants can join. Second, it offers traceability of governance decisions and tracking of training processes, which are also crucial in a production environment. Beyond presenting the architectural design, we analyze requirements for the real-world use of FL and evaluate the architecture with a scenario-based analysis method.