LGAug 19, 2022
Diffusion-based Time Series Imputation and Forecasting with Structured State Space ModelsJuan Miguel Lopez Alcaraz, Nils Strodthoff
The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
SPJan 19, 2023
Diffusion-based Conditional ECG Generation with Structured State Space ModelsJuan Miguel Lopez Alcaraz, Nils Strodthoff
Synthetic data generation is a promising solution to address privacy issues with the distribution of sensitive health data. Recently, diffusion models have set new standards for generative models for different data modalities. Also very recently, structured state space models emerged as a powerful modeling paradigm to capture long-term dependencies in time series. We put forward SSSD-ECG, as the combination of these two technologies, for the generation of synthetic 12-lead electrocardiograms conditioned on more than 70 ECG statements. Due to a lack of reliable baselines, we also propose conditional variants of two state-of-the-art unconditional generative models. We thoroughly evaluate the quality of the generated samples, by evaluating pretrained classifiers on the generated data and by evaluating the performance of a classifier trained only on synthetic data, where SSSD-ECG clearly outperforms its GAN-based competitors. We demonstrate the soundness of our approach through further experiments, including conditional class interpolation and a clinical Turing test demonstrating the high quality of the SSSD-ECG samples across a wide range of conditions.
LGJul 25, 2024
Enhancing clinical decision support with physiological waveforms -- a multimodal benchmark in emergency careJuan Miguel Lopez Alcaraz, Hjalmar Bouma, Nils Strodthoff
Background: AI-driven prediction algorithms have the potential to enhance emergency medicine by enabling rapid and accurate decision-making regarding patient status and potential deterioration. However, the integration of multimodal data, including raw waveform signals, remains underexplored in clinical decision support. Methods: We present a dataset and benchmarking protocol designed to advance multimodal decision support in emergency care. Our models utilize demographics, biometrics, vital signs, laboratory values, and electrocardiogram (ECG) waveforms as inputs to predict both discharge diagnoses and patient deterioration. Results: The diagnostic model achieves area under the receiver operating curve (AUROC) scores above 0.8 for 609 out of 1,428 conditions, covering both cardiac (e.g., myocardial infarction) and non-cardiac (e.g., renal disease, diabetes) diagnoses. The deterioration model attains AUROC scores above 0.8 for 14 out of 15 targets, accurately predicting critical events such as cardiac arrest, mechanical ventilation, ICU admission, and mortality. Conclusions: Our study highlights the positive impact of incorporating raw waveform data into decision support models, improving predictive performance. By introducing a unique, publicly available dataset and baseline models, we provide a foundation for measurable progress in AI-driven decision support for emergency care.
SPOct 11, 2023
Using explainable AI to investigate electrocardiogram changes during healthy aging -- from expert features to raw signalsGabriel Ott, Yannik Schaubelt, Juan Miguel Lopez Alcaraz et al.
Cardiovascular diseases remain the leading global cause of mortality. Age is an important covariate whose effect is most easily investigated in a healthy cohort to properly distinguish the former from disease-related changes. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes in individuals as they age. However, these features, while informative, may potentially obscure underlying data relationships. In this paper we present the following contributions: (1) We employ a deep-learning model and a tree-based model to analyze ECG data from a robust dataset of healthy individuals across varying ages in both raw signals and ECG feature format. (2) We use explainable AI methods to identify the most discriminative ECG features across age groups.(3) Our analysis with tree-based classifiers reveals age-related declines in inferred breathing rates and identifies notably high SDANN values as indicative of elderly individuals, distinguishing them from younger adults. (4) Furthermore, the deep-learning model underscores the pivotal role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age. These findings shed new light on age-related ECG changes, offering insights that transcend traditional feature-based approaches.
SPJul 26, 2024
CardioLab: Laboratory Values Estimation from Electrocardiogram Features - An Exploratory StudyJuan Miguel Lopez Alcaraz, Nils Strodthoff
Laboratory value represents a cornerstone of medical diagnostics, but suffers from slow turnaround times, and high costs and only provides information about a single point in time. The continuous estimation of laboratory values from non-invasive data such as electrocardiogram (ECG) would therefore mark a significant frontier in healthcare monitoring. Despite its potential, this domain remains relatively underexplored. In this preliminary study, we used a publicly available dataset (MIMIC-IV-ECG) to investigate the feasibility of inferring laboratory values from ECG features and patient demographics using tree-based models (XGBoost). We define the prediction task as a binary problem of whether the lab value falls into low or high abnormalities. We assessed model performance with AUROC. Our findings demonstrate promising results in the estimation of laboratory values related to different organ systems. While further research and validation are warranted to fully assess the clinical utility and generalizability of the approach, our findings lay the groundwork for future investigations for laboratory value estimation using ECG data. Such advancements hold promise for revolutionizing predictive healthcare applications, offering faster, non-invasive, and more affordable means of patient monitoring.
SPAug 30, 2024
Estimation of Cardiac and Non-cardiac Diagnosis from Electrocardiogram FeaturesJuan Miguel Lopez Alcaraz, Nils Strodthoff
Ensuring timely and accurate diagnosis of medical conditions is paramount for effective patient care. Electrocardiogram (ECG) signals are fundamental for evaluating a patient's cardiac health and are readily available. Despite this, little attention has been given to the remarkable potential of ECG data in detecting non-cardiac conditions. In our study, we used publicly available datasets (MIMIC-IV-ECG-ICD and ECG-VIEW II) to investigate the feasibility of inferring general diagnostic conditions from ECG features. To this end, we trained a tree-based model (XGBoost) based on ECG features and basic demographic features to estimate a wide range of diagnoses, encompassing both cardiac and non-cardiac conditions. Our results demonstrate the reliability of estimating 23 cardiac as well as 21 non-cardiac conditions above 0.7 AUROC in a statistically significant manner across a wide range of physiological categories. Our findings underscore the predictive potential of ECG data in identifying well-known cardiac conditions. However, even more striking, this research represents a pioneering effort in systematically expanding the scope of ECG-based diagnosis to conditions not traditionally associated with the cardiac system.
LGJul 26, 2022
From Interpretable Filters to Predictions of Convolutional Neural Networks with Explainable Artificial IntelligenceShagufta Henna, Juan Miguel Lopez Alcaraz
Convolutional neural networks (CNN) are known for their excellent feature extraction capabilities to enable the learning of models from data, yet are used as black boxes. An interpretation of the convolutional filtres and associated features can help to establish an understanding of CNN to distinguish various classes. In this work, we focus on the explainability of a CNN model called as cnnexplain that is used for Covid-19 and non-Covid-19 classification with a focus on the interpretability of features by the convolutional filters, and how these features contribute to classification. Specifically, we have used various explainable artificial intelligence (XAI) methods, such as visualizations, SmoothGrad, Grad-CAM, and LIME to provide interpretation of convolutional filtres, and relevant features, and their role in classification. We have analyzed the explanation of these methods for Covid-19 detection using dry cough spectrograms. Explanation results obtained from the LIME, SmoothGrad, and Grad-CAM highlight important features of different spectrograms and their relevance to classification.
SPNov 22, 2024
Abnormality Prediction and Forecasting of Laboratory Values from Electrocardiogram Signals Using Multimodal Deep LearningJuan Miguel Lopez Alcaraz, Nils Strodthoff
This study investigates the feasibility of using electrocardiogram (ECG) data combined with basic patient metadata to estimate and monitor prompt laboratory abnormalities. We use the MIMIC-IV dataset to train multimodal deep learning models on ECG waveforms, demographics, biometrics, and vital signs. Our model is a structured state space classifier with late fusion for metadata. We frame the task as individual binary classifications per abnormality and evaluate performance using AUROC. The models achieve strong performance, with AUROCs above 0.70 for 24 lab values in abnormality prediction and up to 24 in abnormality forecasting, across cardiac, renal, hematological, metabolic, immunological, and coagulation categories. NTproBNP (>353 pg/mL) is best predicted (AUROC > 0.90). Other values with AUROC > 0.85 include Hemoglobin (>17.5 g/dL), Albumin (>5.2 g/dL), and Hematocrit (>51%). Our findings show ECG combined with clinical data enables prompt abnormality prediction and forecasting of lab abnormalities, offering a non-invasive, cost-effective alternative to traditional testing. This can support early intervention and enhanced patient monitoring. ECG and clinical data can help estimate and monitor abnormal lab values, potentially improving care while reducing reliance on invasive and costly procedures.
SPDec 10, 2024
Explainable machine learning for neoplasms diagnosis via electrocardiograms: an externally validated studyJuan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
Background: Neoplasms are a major cause of mortality globally, where early diagnosis is essential for improving outcomes. Current diagnostic methods are often invasive, expensive, and inaccessible in resource-limited settings. This study explores the potential of electrocardiogram (ECG) data, a widely available and non-invasive tool for diagnosing neoplasms through cardiovascular changes linked to neoplastic presence. Methods: A diagnostic pipeline combining tree-based machine learning models with Shapley value analysis for explainability was developed. The model was trained and internally validated on a large dataset and externally validated on an independent cohort to ensure robustness and generalizability. Key ECG features contributing to predictions were identified and analyzed. Results: The model achieved high diagnostic accuracy in both internal testing and external validation cohorts. Shapley value analysis highlighted significant ECG features, including novel predictors. The approach is cost-effective, scalable, and suitable for resource-limited settings, offering insights into cardiovascular changes associated with neoplasms and their therapies. Conclusions: This study demonstrates the feasibility of using ECG signals and machine learning for non-invasive neoplasm diagnosis. By providing interpretable insights into cardio-neoplasm interactions, this method addresses gaps in diagnostics and supports integration into broader diagnostic and therapeutic frameworks.
LGDec 4, 2024
Electrocardiogram-based diagnosis of liver diseases: an externally validated and explainable machine learning approachJuan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
Background: Liver diseases present a significant global health challenge and often require costly, invasive diagnostics. Electrocardiography (ECG), a widely available and non-invasive tool, can enable the detection of liver disease by capturing cardiovascular-hepatic interactions. Methods: We trained tree-based machine learning models on ECG features to detect liver diseases using two large datasets: MIMIC-IV-ECG (467,729 patients, 2008-2019) and ECG-View II (775,535 patients, 1994-2013). The task was framed as binary classification, with performance evaluated via the area under the receiver operating characteristic curve (AUROC). To improve interpretability, we applied explainability methods to identify key predictive features. Findings: The models showed strong predictive performance with good generalizability. For example, AUROCs for alcoholic liver disease (K70) were 0.8025 (95% confidence interval (CI), 0.8020-0.8035) internally and 0.7644 (95% CI, 0.7641-0.7649) externally; for hepatic failure (K72), scores were 0.7404 (95% CI, 0.7389-0.7415) and 0.7498 (95% CI, 0.7494-0.7509), respectively. The explainability analysis consistently identified age and prolonged QTc intervals (corrected QT, reflecting ventricular repolarization) as key predictors. Features linked to autonomic regulation and electrical conduction abnormalities were also prominent, supporting known cardiovascular-liver connections and suggesting QTc as a potential biomarker. Interpretation: ECG-based machine learning offers a promising, interpretable approach for liver disease detection, particularly in resource-limited settings. By revealing clinically relevant biomarkers, this method supports non-invasive diagnostics, early detection, and risk stratification prior to targeted clinical assessments.
LGMay 24, 2024
Explaining Time Series Classification Predictions via Causal AttributionsJuan Miguel Lopez Alcaraz, Nils Strodthoff
Despite the excelling performance of machine learning models, understanding their decisions remains a long-standing goal. Although commonly used attribution methods from explainable AI attempt to address this issue, they typically rely on associational rather than causal relationships. In this study, within the context of time series classification, we introduce a novel model-agnostic attribution method to assess the causal effect of concepts i.e., predefined segments within a time series, on classification outcomes. Our approach compares these causal attributions with closely related associational attributions, both theoretically and empirically. To estimate counterfactual outcomes, we use state-of-the-art diffusion models backed by state space models. We demonstrate the insights gained by our approach for a diverse set of qualitatively different time series classification tasks. Although causal and associational attributions might often share some similarities, in all cases they differ in important details, underscoring the risks associated with drawing causal conclusions from associational data alone. We believe that the proposed approach is also widely applicable in other domains to shed some light on the limits of associational attributions.
SPSep 29, 2025
Benchmarking ECG Foundational Models: A Reality Check Across Clinical TasksM A Al-Masud, Juan Miguel Lopez Alcaraz, Nils Strodthoff
The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. Foundation models promise broader adaptability, but their generalization across diverse ECG tasks is not well understood. We benchmarked eight ECG foundation models on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes. Results show heterogeneous performance across domains: in the most widely studied domain, adult ECG interpretation, three foundation models consistently outperformed strong supervised baselines. In contrast, ECG-CPC, a compact structured state-space model pretrained on HEEDB, dominated other categories where most foundation models failed to surpass supervised learning. Foundation models also displayed distinct scaling behaviors with dataset size, which are critical for small-scale clinical applications. Overall, while foundation models show promise for adult ECG analysis, substantial gaps remain in cardiac structure, outcome prediction, and patient characterization. Notably, ECG-CPC's strong performance despite being orders of magnitude smaller and consuming minimal computational resources highlights untapped opportunities for advancing ECG foundation models.
SPDec 18, 2023
Prospects for AI-Enhanced ECG as a Unified Screening Tool for Cardiac and Non-Cardiac Conditions -- An Explorative Study in Emergency CareNils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp
Current deep learning algorithms designed for automatic ECG analysis have exhibited notable accuracy. However, akin to traditional electrocardiography, they tend to be narrowly focused and typically address a singular diagnostic condition. In this exploratory study, we specifically investigate the capability of a single model to predict a diverse range of both cardiac and non-cardiac discharge diagnoses based on a sole ECG collected in the emergency department. We find that 253, 81 cardiac, and 172 non-cardiac, ICD codes can be reliably predicted in the sense of exceeding an AUROC score of 0.8 in a statistically significant manner. This underscores the model's proficiency in handling a wide array of cardiac and non-cardiac diagnostic scenarios which demonstrates potential as a screening tool for diverse medical encounters.
SPOct 28, 2025
Towards actionable hypotension prediction -- predicting catecholamine therapy initiation in the intensive care unitRichard Koebe, Noah Saibel, Juan Miguel Lopez Alcaraz et al.
Hypotension in critically ill ICU patients is common and life-threatening. Escalation to catecholamine therapy marks a key management step, with both undertreatment and overtreatment posing risks. Most machine learning (ML) models predict hypotension using fixed MAP thresholds or MAP forecasting, overlooking the clinical decision behind treatment escalation. Predicting catecholamine initiation, the start of vasoactive or inotropic agent administration offers a more clinically actionable target reflecting real decision-making. Using the MIMIC-III database, we modeled catecholamine initiation as a binary event within a 15-minute prediction window. Input features included statistical descriptors from a two-hour sliding MAP context window, along with demographics, biometrics, comorbidities, and ongoing treatments. An Extreme Gradient Boosting (XGBoost) model was trained and interpreted via SHapley Additive exPlanations (SHAP). The model achieved an AUROC of 0.822 (0.813-0.830), outperforming the hypotension baseline (MAP < 65, AUROC 0.686 [0.675-0.699]). SHAP analysis highlighted recent MAP values, MAP trends, and ongoing treatments (e.g., sedatives, electrolytes) as dominant predictors. Subgroup analysis showed higher performance in males, younger patients (<53 years), those with higher BMI (>32), and patients without comorbidities or concurrent medications. Predicting catecholamine initiation based on MAP dynamics, treatment context, and patient characteristics supports the critical decision of when to escalate therapy, shifting focus from threshold-based alarms to actionable decision support. This approach is feasible across a broad ICU cohort under natural event imbalance. Future work should enrich temporal and physiological context, extend label definitions to include therapy escalation, and benchmark against existing hypotension prediction systems.
SPSep 22, 2025
Predicting Chest Radiograph Findings from Electrocardiograms Using Interpretable Machine LearningJulia Matejas, Olaf Żurawski, Nils Strodthoff et al.
Purpose: Chest X-rays are essential for diagnosing pulmonary conditions, but limited access in resource-constrained settings can delay timely diagnosis. Electrocardiograms (ECGs), in contrast, are widely available, non-invasive, and often acquired earlier in clinical workflows. This study aims to assess whether ECG features and patient demographics can predict chest radiograph findings using an interpretable machine learning approach. Methods: Using the MIMIC-IV database, Extreme Gradient Boosting (XGBoost) classifiers were trained to predict diverse chest radiograph findings from ECG-derived features and demographic variables. Recursive feature elimination was performed independently for each target to identify the most predictive features. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) with bootstrapped 95% confidence intervals. Shapley Additive Explanations (SHAP) were applied to interpret feature contributions. Results: Models successfully predicted multiple chest radiograph findings with varying accuracy. Feature selection tailored predictors to each target, and including demographic variables consistently improved performance. SHAP analysis revealed clinically meaningful contributions from ECG features to radiographic predictions. Conclusion: ECG-derived features combined with patient demographics can serve as a proxy for certain chest radiograph findings, enabling early triage or pre-screening in settings where radiographic imaging is limited. Interpretable machine learning demonstrates potential to support radiology workflows and improve patient care.
SPFeb 7, 2025
Explainable and externally validated machine learning for neurocognitive diagnosis via electrocardiogramsJuan Miguel Lopez Alcaraz, Ebenezer Oloyede, David Taylor et al.
Background: Electrocardiogram (ECG) analysis has emerged as a promising tool for detecting physiological changes linked to non-cardiac disorders. Given the close connection between cardiovascular and neurocognitive health, ECG abnormalities may be present in individuals with co-occurring neurocognitive conditions. This highlights the potential of ECG as a biomarker to improve detection, therapy monitoring, and risk stratification in patients with neurocognitive disorders, an area that remains underexplored. Methods: We aim to demonstrate the feasibility to predict neurocognitive disorders from ECG features across diverse patient populations. We utilized ECG features and demographic data to predict neurocognitive disorders defined by ICD-10 codes, focusing on dementia, delirium, and Parkinson's disease. Internal and external validations were performed using the MIMIC-IV and ECG-View datasets. Predictive performance was assessed using AUROC scores, and Shapley values were used to interpret feature contributions. Results: Significant predictive performance was observed for disorders within the neurcognitive disorders. Significantly, the disorders with the highest predictive performance is F03: Dementia, with an internal AUROC of 0.848 (95% CI: 0.848-0.848) and an external AUROC of 0.865 (0.864-0.965), followed by G30: Alzheimer's, with an internal AUROC of 0.809 (95% CI: 0.808-0.810) and an external AUROC of 0.863 (95% CI: 0.863-0.864). Feature importance analysis revealed both known and novel ECG correlates. ECGs hold promise as non-invasive, explainable biomarkers for selected neurocognitive disorders. This study demonstrates robust performance across cohorts and lays the groundwork for future clinical applications, including early detection and personalized monitoring.