CYOct 31, 2025
Before the Clinic: Transparent and Operable Design Principles for Healthcare AIAlexander Bakumenko, Aaron J. Masino, Janine Hoelscher
The translation of artificial intelligence (AI) systems into clinical practice requires bridging fundamental gaps between explainable AI theory, clinician expectations, and governance requirements. While conceptual frameworks define what constitutes explainable AI (XAI) and qualitative studies identify clinician needs, little practical guidance exists for development teams to prepare AI systems prior to clinical evaluation. We propose two foundational design principles, Transparent Design and Operable Design, that operationalize pre-clinical technical requirements for healthcare AI. Transparent Design encompasses interpretability and understandability artifacts that enable case-level reasoning and system traceability. Operable Design encompasses calibration, uncertainty, and robustness to ensure reliable, predictable system behavior under real-world conditions. We ground these principles in established XAI frameworks, map them to documented clinician needs, and demonstrate their alignment with emerging governance requirements. This pre-clinical playbook provides actionable guidance for development teams, accelerates the path to clinical evaluation, and establishes a shared vocabulary bridging AI researchers, healthcare practitioners, and regulatory stakeholders. By explicitly scoping what can be built and verified before clinical deployment, we aim to reduce friction in clinical AI translation while remaining cautious about what constitutes validated, deployed explainability.
LGNov 19, 2025
Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality AttributionAlexander Bakumenko, Janine Hoelscher, Hudson Smith
Early identification of intensive care patients at risk of in-hospital mortality enables timely intervention and efficient resource allocation. Despite high predictive performance, existing machine learning approaches lack transparency and robustness, limiting clinical adoption. We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay. A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes. This traceable architecture allows for multilevel interpretability: feature attributions within each modality and direct per-case modality attributions quantifying how vitals and notes influence each decision. On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model (AUPRC 0.565 vs. 0.526; AUROC 0.891 vs. 0.876) while maintaining well-calibrated predictions. The system remains robust through a calibrated fallback when a modality is missing. These results demonstrate competitive performance with reliable, auditable risk estimates and transparent, predictable operation, which together are crucial for clinical use.