Jana Fehr

2papers

2 Papers

CYFeb 27

How Meta-research Can Pave the Road Towards Trustworthy AI In Healthcare: Catalogue of Ideas and Roadmap for Future Research

Valerie Bürger, Marlie Besouw, Jana Fehr et al.

Meta-research and Trustworthy AI (TAI) share common goals, namely improving evidence, robustness, and transparency, yet there is very little interplay between the two fields. To investigate the potential benefits of closer collaboration between the domains of TAI in healthcare and meta-research, we convened an interdisciplinary workshop funded by the Volkswagen Foundation in February 2025. The workshop aimed to collaboratively examine key tensions in translating AI ethics principles into practice and to identify potential solutions informed by meta-research approaches. A Design Thinking-informed co-creation approach was followed by an inductive descriptive analysis of the outputs. Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare. These challenges include achieving robustness, reproducibility, and replicability; late-stage development and the integration of AI into clinical practice; the selection of appropriate evaluation metrics; specific AI-related challenges in preclinical and biomedical research; gaps of transparency in medical AI, as well as the need for improved conceptual clarity and AI literacy among stakeholders. Finally, we offer a catalog of ideas and roadmap for future research to inform scholars in both fields on existing interconnections and serve as a foundation for guiding future interdisciplinary efforts.

LGJan 30

Metric Hub: A metric library and practical selection workflow for use-case-driven data quality assessment in medical AI

Katinka Becker, Maximilian P. Oppelt, Tobias S. Zech et al.

Machine learning (ML) in medicine has transitioned from research to concrete applications aimed at supporting several medical purposes like therapy selection, monitoring and treatment. Acceptance and effective adoption by clinicians and patients, as well as regulatory approval, require evidence of trustworthiness. A major factor for the development of trustworthy AI is the quantification of data quality for AI model training and testing. We have recently proposed the METRIC-framework for systematically evaluating the suitability (fit-for-purpose) of data for medical ML for a given task. Here, we operationalize this theoretical framework by introducing a collection of data quality metrics - the metric library - for practically measuring data quality dimensions. For each metric, we provide a metric card with the most important information, including definition, applicability, examples, pitfalls and recommendations, to support the understanding and implementation of these metrics. Furthermore, we discuss strategies and provide decision trees for choosing an appropriate set of data quality metrics from the metric library given specific use cases. We demonstrate the impact of our approach exemplarily on the PTB-XL ECG-dataset. This is a first step to enable fit-for-purpose evaluation of training and test data in practice as the base for establishing trustworthy AI in medicine.