SPOct 11, 2023
Using explainable AI to investigate electrocardiogram changes during healthy aging -- from expert features to raw signalsGabriel Ott, Yannik Schaubelt, Juan Miguel Lopez Alcaraz et al.
Cardiovascular diseases remain the leading global cause of mortality. Age is an important covariate whose effect is most easily investigated in a healthy cohort to properly distinguish the former from disease-related changes. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes in individuals as they age. However, these features, while informative, may potentially obscure underlying data relationships. In this paper we present the following contributions: (1) We employ a deep-learning model and a tree-based model to analyze ECG data from a robust dataset of healthy individuals across varying ages in both raw signals and ECG feature format. (2) We use explainable AI methods to identify the most discriminative ECG features across age groups.(3) Our analysis with tree-based classifiers reveals age-related declines in inferred breathing rates and identifies notably high SDANN values as indicative of elderly individuals, distinguishing them from younger adults. (4) Furthermore, the deep-learning model underscores the pivotal role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age. These findings shed new light on age-related ECG changes, offering insights that transcend traditional feature-based approaches.
SPDec 10, 2024
Explainable machine learning for neoplasms diagnosis via electrocardiograms: an externally validated studyJuan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
Background: Neoplasms are a major cause of mortality globally, where early diagnosis is essential for improving outcomes. Current diagnostic methods are often invasive, expensive, and inaccessible in resource-limited settings. This study explores the potential of electrocardiogram (ECG) data, a widely available and non-invasive tool for diagnosing neoplasms through cardiovascular changes linked to neoplastic presence. Methods: A diagnostic pipeline combining tree-based machine learning models with Shapley value analysis for explainability was developed. The model was trained and internally validated on a large dataset and externally validated on an independent cohort to ensure robustness and generalizability. Key ECG features contributing to predictions were identified and analyzed. Results: The model achieved high diagnostic accuracy in both internal testing and external validation cohorts. Shapley value analysis highlighted significant ECG features, including novel predictors. The approach is cost-effective, scalable, and suitable for resource-limited settings, offering insights into cardiovascular changes associated with neoplasms and their therapies. Conclusions: This study demonstrates the feasibility of using ECG signals and machine learning for non-invasive neoplasm diagnosis. By providing interpretable insights into cardio-neoplasm interactions, this method addresses gaps in diagnostics and supports integration into broader diagnostic and therapeutic frameworks.
LGDec 4, 2024
Electrocardiogram-based diagnosis of liver diseases: an externally validated and explainable machine learning approachJuan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
Background: Liver diseases present a significant global health challenge and often require costly, invasive diagnostics. Electrocardiography (ECG), a widely available and non-invasive tool, can enable the detection of liver disease by capturing cardiovascular-hepatic interactions. Methods: We trained tree-based machine learning models on ECG features to detect liver diseases using two large datasets: MIMIC-IV-ECG (467,729 patients, 2008-2019) and ECG-View II (775,535 patients, 1994-2013). The task was framed as binary classification, with performance evaluated via the area under the receiver operating characteristic curve (AUROC). To improve interpretability, we applied explainability methods to identify key predictive features. Findings: The models showed strong predictive performance with good generalizability. For example, AUROCs for alcoholic liver disease (K70) were 0.8025 (95% confidence interval (CI), 0.8020-0.8035) internally and 0.7644 (95% CI, 0.7641-0.7649) externally; for hepatic failure (K72), scores were 0.7404 (95% CI, 0.7389-0.7415) and 0.7498 (95% CI, 0.7494-0.7509), respectively. The explainability analysis consistently identified age and prolonged QTc intervals (corrected QT, reflecting ventricular repolarization) as key predictors. Features linked to autonomic regulation and electrical conduction abnormalities were also prominent, supporting known cardiovascular-liver connections and suggesting QTc as a potential biomarker. Interpretation: ECG-based machine learning offers a promising, interpretable approach for liver disease detection, particularly in resource-limited settings. By revealing clinically relevant biomarkers, this method supports non-invasive diagnostics, early detection, and risk stratification prior to targeted clinical assessments.
SPDec 18, 2023
Prospects for AI-Enhanced ECG as a Unified Screening Tool for Cardiac and Non-Cardiac Conditions -- An Explorative Study in Emergency CareNils Strodthoff, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp
Current deep learning algorithms designed for automatic ECG analysis have exhibited notable accuracy. However, akin to traditional electrocardiography, they tend to be narrowly focused and typically address a singular diagnostic condition. In this exploratory study, we specifically investigate the capability of a single model to predict a diverse range of both cardiac and non-cardiac discharge diagnoses based on a sole ECG collected in the emergency department. We find that 253, 81 cardiac, and 172 non-cardiac, ICD codes can be reliably predicted in the sense of exceeding an AUROC score of 0.8 in a statistically significant manner. This underscores the model's proficiency in handling a wide array of cardiac and non-cardiac diagnostic scenarios which demonstrates potential as a screening tool for diverse medical encounters.
CLOct 21, 2025
ECG-LLM -- training and evaluation of domain-specific large language models for electrocardiographyLara Ahrens, Wilhelm Haverkamp, Nils Strodthoff
Domain-adapted open-weight large language models (LLMs) offer promising healthcare applications, from queryable knowledge bases to multimodal assistants, with the crucial advantage of local deployment for privacy preservation. However, optimal adaptation strategies, evaluation methodologies, and performance relative to general-purpose LLMs remain poorly characterized. We investigated these questions in electrocardiography, an important area of cardiovascular medicine, by finetuning open-weight models on domain-specific literature and implementing a multi-layered evaluation framework comparing finetuned models, retrieval-augmented generation (RAG), and Claude Sonnet 3.7 as a representative general-purpose model. Finetuned Llama 3.1 70B achieved superior performance on multiple-choice evaluations and automatic text metrics, ranking second to Claude 3.7 in LLM-as-a-judge assessments. Human expert evaluation favored Claude 3.7 and RAG approaches for complex queries. Finetuned models significantly outperformed their base counterparts across nearly all evaluation modes. Our findings reveal substantial performance heterogeneity across evaluation methodologies, underscoring assessment complexity. Nevertheless, domain-specific adaptation through finetuning and RAG achieves competitive performance with proprietary models, supporting the viability of privacy-preserving, locally deployable clinical solutions.
LGOct 20, 2025
S4ECG: Exploring the impact of long-range interactions for arrhythmia predictionTiezhi Wang, Wilhelm Haverkamp, Nils Strodthoff
The electrocardiogram (ECG) exemplifies biosignal-based time series with continuous, temporally ordered structure reflecting cardiac physiological and pathophysiological dynamics. Detailed analysis of these dynamics has proven challenging, as conventional methods capture either global trends or local waveform features but rarely their simultaneous interplay at high temporal resolution. To bridge global and local signal analysis, we introduce S4ECG, a novel deep learning architecture leveraging structured state space models for multi-epoch arrhythmia classification. Our joint multi-epoch predictions significantly outperform single-epoch approaches by 1.0-11.6% in macro-AUROC, with atrial fibrillation specificity improving from 0.718-0.979 to 0.967-0.998, demonstrating superior performance in-distribution and enhanced out-of-distribution robustness. Systematic investigation reveals optimal temporal dependency windows spanning 10-20 minutes for peak performance. This work contributes to a paradigm shift toward temporally-aware arrhythmia detection algorithms, opening new possibilities for ECG interpretation, in particular for complex arrhythmias like atrial fibrillation and atrial flutter.
SPFeb 7, 2025
Explainable and externally validated machine learning for neurocognitive diagnosis via electrocardiogramsJuan Miguel Lopez Alcaraz, Ebenezer Oloyede, David Taylor et al.
Background: Electrocardiogram (ECG) analysis has emerged as a promising tool for detecting physiological changes linked to non-cardiac disorders. Given the close connection between cardiovascular and neurocognitive health, ECG abnormalities may be present in individuals with co-occurring neurocognitive conditions. This highlights the potential of ECG as a biomarker to improve detection, therapy monitoring, and risk stratification in patients with neurocognitive disorders, an area that remains underexplored. Methods: We aim to demonstrate the feasibility to predict neurocognitive disorders from ECG features across diverse patient populations. We utilized ECG features and demographic data to predict neurocognitive disorders defined by ICD-10 codes, focusing on dementia, delirium, and Parkinson's disease. Internal and external validations were performed using the MIMIC-IV and ECG-View datasets. Predictive performance was assessed using AUROC scores, and Shapley values were used to interpret feature contributions. Results: Significant predictive performance was observed for disorders within the neurcognitive disorders. Significantly, the disorders with the highest predictive performance is F03: Dementia, with an internal AUROC of 0.848 (95% CI: 0.848-0.848) and an external AUROC of 0.865 (0.864-0.965), followed by G30: Alzheimer's, with an internal AUROC of 0.809 (95% CI: 0.808-0.810) and an external AUROC of 0.863 (95% CI: 0.863-0.864). Feature importance analysis revealed both known and novel ECG correlates. ECGs hold promise as non-invasive, explainable biomarkers for selected neurocognitive disorders. This study demonstrates robust performance across cohorts and lays the groundwork for future clinical applications, including early detection and personalized monitoring.
SPMay 26, 2023
Explaining Deep Learning for ECG Analysis: Building Blocks for Auditing and Knowledge DiscoveryPatrick Wagner, Temesgen Mehari, Wilhelm Haverkamp et al.
Deep neural networks have become increasingly popular for analyzing ECG data because of their ability to accurately identify cardiac conditions and hidden clinical factors. However, the lack of transparency due to the black box nature of these models is a common concern. To address this issue, explainable AI (XAI) methods can be employed. In this study, we present a comprehensive analysis of post-hoc XAI methods, investigating the local (attributions per sample) and global (based on domain expert concepts) perspectives. We have established a set of sanity checks to identify sensible attribution methods, and we provide quantitative evidence in accordance with expert rules. This dataset-wide analysis goes beyond anecdotal evidence by aggregating data across patient subgroups. Furthermore, we demonstrate how these XAI techniques can be utilized for knowledge discovery, such as identifying subtypes of myocardial infarction. We believe that these proposed methods can serve as building blocks for a complementary assessment of the internal validity during a certification process, as well as for knowledge discovery in the field of ECG analysis.