18.1CVMay 18
Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography OutcomesYujie Xiao, Qinghao Zhao, Gongzheng Tang et al.
CAD remains a major global public health burden, yet scalable screening tools are limited. Although CCTA is a first-line non-invasive diagnostic modality, its use is constrained by resource requirements and radiation exposure. AI-ECG may offer a complementary approach for CAD risk stratification. In this multicenter study, we developed and validated an AI-ECG model using CCTA as the anatomical reference standard to predict vessel-specific coronary stenosis. In internal validation, the model achieved AUC values of 0.683-0.744 across vessels and showed consistent external performance. Discrimination was maintained in clinically normal ECGs and remained broadly stable across subgroups. Model-predicted probabilities increased monotonically with CCTA-defined stenosis severity. Model probabilities were converted into vessel-specific low-, intermediate-, and high-risk strata using predefined sensitivity- and specificity-based thresholds. Calibration analysis showed agreement between predicted and observed risk, while DCA indicated net clinical benefit over treat-all and treat-none strategies. Integrating AI-derived risk strata with guideline-based PTP categories improved rule-out performance, reduced the gray-zone proportion, and achieved positive NRI compared with PTP alone. In a longitudinal follow-up cohort, Kaplan-Meier analysis showed clear separation of major adverse cardiovascular event risk across model-defined risk groups. Waveform- and attribution-based analyses further identified structured ECG morphology differences and physiologically meaningful signal regions associated with high-risk predictions. These findings support AI-ECG as a feasible tool for complementary CAD screening, anatomical risk estimation, and clinical triage, while prospective studies are needed to confirm its clinical impact.
CVFeb 10, 2023
Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram ImagesDeyun Zhang, Shijia Geng, Yang Zhou et al.
The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this study, we present an AI system developed to detect and screen cardiac abnormalities (CAs) from real-world ECG images. The system was evaluated on a large dataset of 52,357 patients from multiple regions and populations across the world. On the detection task, the AI system obtained area under the receiver operating curve (AUC) of 0.996 (hold-out test), 0.994 (external test 1), 0.984 (external test 2), and 0.979 (external test 3), respectively. Meanwhile, the detection results of AI system showed a strong correlation with the diagnosis of cardiologists (cardiologist 1 (R=0.794, p<1e-3), cardiologist 2 (R=0.812, p<1e-3)). On the screening task, the AI system achieved AUCs of 0.894 (hold-out test) and 0.850 (external test). The screening performance of the AI system was better than that of the cardiologists (AI system (0.846) vs. cardiologist 1 (0.520) vs. cardiologist 2 (0.480)). Our study demonstrates the feasibility of an accurate, objective, easy-to-use, fast, and low-cost AI system for CA detection and screening. The system has the potential to be used by healthcare professionals, caregivers, and general users to assess CAs based on real-world ECG images.
CLMar 8, 2025Code
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and ImagesXiang Lan, Feng Wu, Kai He et al.
While recent multimodal large language models (MLLMs) have advanced automated ECG interpretation, they still face two key limitations: (1) insufficient multimodal synergy between time series signals and visual ECG representations, and (2) limited explainability in linking diagnoses to granular waveform evidence. We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations: a dual-encoder framework extracting complementary time series and image features, cross-modal alignment for effective multimodal understanding, and knowledge-guided instruction generation for generating high-granularity grounding data (ECG-Grounding) linking diagnoses to measurable parameters ($e.g.$, QRS/PR Intervals). Additionally, we propose the Grounded ECG Understanding task, a clinically motivated benchmark designed to comprehensively assess the MLLM's capability in grounded ECG understanding. Experimental results on both existing and our proposed benchmarks show GEM significantly improves predictive performance (CSN $7.4\% \uparrow$), explainability ($22.7\% \uparrow$), and grounding ($24.8\% \uparrow$), making it more suitable for real-world clinical applications. GitHub repository: https://github.com/lanxiang1017/GEM.git
SPFeb 4
Aortic Valve Disease Detection from PPG via Physiology-Informed Self-Supervised LearningJiaze Wang, Qinghao Zhao, Zizheng Chen et al.
Traditional diagnosis of aortic valve disease relies on echocardiography, but its cost and required expertise limit its use in large-scale early screening. Photoplethysmography (PPG) has emerged as a promising screening modality due to its widespread availability in wearable devices and its ability to reflect underlying hemodynamic dynamics. However, the extreme scarcity of gold-standard labeled PPG data severely constrains the effectiveness of data-driven approaches. To address this challenge, we propose and validate a new paradigm, Physiology-Guided Self-Supervised Learning (PG-SSL), aimed at unlocking the value of large-scale unlabeled PPG data for efficient screening of Aortic Stenosis (AS) and Aortic Regurgitation (AR). Using over 170,000 unlabeled PPG samples from the UK Biobank, we formalize clinical knowledge into a set of PPG morphological phenotypes and construct a pulse pattern recognition proxy task for self-supervised pre-training. A dual-branch, gated-fusion architecture is then employed for efficient fine-tuning on a small labeled subset. The proposed PG-SSL framework achieves AUCs of 0.765 and 0.776 for AS and AR screening, respectively, significantly outperforming supervised baselines trained on limited labeled data. Multivariable analysis further validates the model output as an independent digital biomarker with sustained prognostic value after adjustment for standard clinical risk factors. This study demonstrates that PG-SSL provides an effective, domain knowledge-driven solution to label scarcity in medical artificial intelligence and shows strong potential for enabling low-cost, large-scale early screening of aortic valve disease.
23.1LGMar 15
Artificial intelligence-enabled single-lead ECG for non-invasive hyperkalemia detection: development, multicenter validation, and proof-of-concept deploymentGongzheng Tang, Qinghao Zhao, Guangkun Nie et al.
Hyperkalemia is a life-threatening electrolyte disorder that is common in patients with chronic kidney disease and heart failure, yet frequent monitoring remains difficult outside hospital settings. We developed and validated Pocket-K, a single-lead AI-ECG system initialized from the ECGFounder foundation model for non-invasive hyperkalemia screening and handheld deployment. In this multicentre observational study using routinely collected clinical ECG and laboratory data, 34,439 patients contributed 62,290 ECG--potassium pairs. Lead I data were used to fine-tune the model. Data from Peking University People's Hospital were divided into development and temporal validation sets, and data from The Second Hospital of Tianjin Medical University served as an independent external validation set. Hyperkalemia was defined as venous serum potassium > 5.5 mmol/L. Pocket-K achieved AUROCs of 0.936 in internal testing, 0.858 in temporal validation, and 0.808 in external validation. For KDIGO-defined moderate-to-severe hyperkalemia (serum potassium >= 6.0 mmol/L), AUROCs increased to 0.940 and 0.861 in the temporal and external sets, respectively. External negative predictive value exceeded 99.3%. Model-predicted high risk below the hyperkalemia threshold was more common in patients with chronic kidney disease and heart failure. A handheld prototype enabled near-real-time inference, supporting future prospective evaluation in native handheld and wearable settings.
SPFeb 21, 2025Code
On-device Computation of Single-lead ECG Parameters for Real-time Remote Cardiac Health Assessment: A Real-world Validation StudySumei Fan, Deyun Zhang, Yue Wang et al.
Accurate, continuous out-of-hospital electrocardiogram (ECG) parameter measurement is vital for real-time cardiac health monitoring and telemedicine. On-device computation of single-lead ECG parameters enables timely assessment without reliance on centralized data processing, advancing personalized, ubiquitous cardiac care-yet comprehensive validation across heterogeneous real-world populations remains limited. This study validated the on-device algorithm FeatureDB (https://github.com/PKUDigitalHealth/FeatureDB) using two datasets: HeartVoice-ECG-lite (369 participants with single-lead ECGs annotated by two physicians) and PTB-XL/PTB-XL+ (21,354 patients with 12-lead ECGs and physicians' diagnostic annotations). FeatureDB computed PR, QT, and QTc intervals, with accuracy evaluated against physician annotations via mean absolute error (MAE), correlation analysis, and Bland-Altman analysis. Diagnostic performance for first-degree atrioventricular block (AVBI, PR-based) and long QT syndrome (LQT, QTc-based) was benchmarked against commercial 12-lead systems (12SL, Uni-G) and open-source algorithm Deli, using AUC, accuracy, sensitivity, and specificity. Results showed high concordance with expert annotations (Pearson correlations: 0.836-0.960), MAEs matching inter-observer variability, and minimal bias. AVBI AUC reached 0.787 (12SL: 0.859; Uni-G: 0.812; Deli: 0.501); LQT AUC was 0.684 (12SL: 0.716; Uni-G: 0.605; Deli: 0.569)-comparable to commercial tools and superior to open-source alternatives. FeatureDB delivers physician-level parameter accuracy and commercial-grade abnormality detection via single-lead devices, supporting scalable telemedicine, decentralized cardiac screening, and continuous monitoring in community and outpatient settings.
CVJun 21, 2024Code
Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular HealthGuangkun Nie, Qinghao Zhao, Gongzheng Tang et al.
Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age.
AISep 2, 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement LearningHaoming Wang, Haoyang Zou, Huatong Song et al. · pku
The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability. In this technical report, we present UI-TARS-2, a native GUI-centered agent model that addresses these challenges through a systematic training methodology: a data flywheel for scalable data generation, a stabilized multi-turn RL framework, a hybrid GUI environment that integrates file systems and terminals, and a unified sandbox platform for large-scale rollouts. Empirical evaluation demonstrates that UI-TARS-2 achieves significant improvements over its predecessor UI-TARS-1.5. On GUI benchmarks, it reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld, outperforming strong baselines such as Claude and OpenAI agents. In game environments, it attains a mean normalized score of 59.8 across a 15-game suite-roughly 60% of human-level performance-and remains competitive with frontier proprietary models (e.g., OpenAI o3) on LMGame-Bench. Additionally, the model can generalize to long-horizon information-seeking tasks and software engineering benchmarks, highlighting its robustness across diverse agent tasks. Detailed analyses of training dynamics further provide insights into achieving stability and efficiency in large-scale agent RL. These results underscore UI-TARS-2's potential to advance the state of GUI agents and exhibit strong generalization to real-world interactive scenarios.
AIJan 23, 2024
A Review of Deep Learning Methods for Photoplethysmography DataGuangkun Nie, Jiabao Zhu, Gongzheng Tang et al.
Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation.
LGOct 25, 2025
AnyECG-Lab: An Exploration Study of Fine-tuning an ECG Foundation Model to Estimate Laboratory Values from Single-Lead ECG SignalsYujie Xiao, Gongzhen Tang, Wenhui Liu et al.
Timely access to laboratory values is critical for clinical decision-making, yet current approaches rely on invasive venous sampling and are intrinsically delayed. Electrocardiography (ECG), as a non-invasive and widely available signal, offers a promising modality for rapid laboratory estimation. Recent progress in deep learning has enabled the extraction of latent hematological signatures from ECGs. However, existing models are constrained by low signal-to-noise ratios, substantial inter-individual variability, limited data diversity, and suboptimal generalization, especially when adapted to low-lead wearable devices. In this work, we conduct an exploratory study leveraging transfer learning to fine-tune ECGFounder, a large-scale pre-trained ECG foundation model, on the Multimodal Clinical Monitoring in the Emergency Department (MC-MED) dataset from Stanford. We generated a corpus of more than 20 million standardized ten-second ECG segments to enhance sensitivity to subtle biochemical correlates. On internal validation, the model demonstrated strong predictive performance (area under the curve above 0.65) for thirty-three laboratory indicators, moderate performance (between 0.55 and 0.65) for fifty-nine indicators, and limited performance (below 0.55) for sixteen indicators. This study provides an efficient artificial-intelligence driven solution and establishes the feasibility scope for real-time, non-invasive estimation of laboratory values.
LGNov 17, 2025
Artificial Intelligence-Enabled Spirometry for Early Detection of Right Heart FailureBin Liu, Qinghao Zhao, Yuxi Zhou et al.
Right heart failure (RHF) is a disease characterized by abnormalities in the structure or function of the right ventricle (RV), which is associated with high morbidity and mortality. Lung disease often causes increased right ventricular load, leading to RHF. Therefore, it is very important to screen out patients with cor pulmonale who develop RHF from people with underlying lung diseases. In this work, we propose a self-supervised representation learning method to early detecting RHF from patients with cor pulmonale, which uses spirogram time series to predict patients with RHF at an early stage. The proposed model is divided into two stages. The first stage is the self-supervised representation learning-based spirogram embedding (SLSE) network training process, where the encoder of the Variational autoencoder (VAE-encoder) learns a robust low-dimensional representation of the spirogram time series from the data-augmented unlabeled data. Second, this low-dimensional representation is fused with demographic information and fed into a CatBoost classifier for the downstream RHF prediction task. Trained and tested on a carefully selected subset of 26,617 individuals from the UK Biobank, our model achieved an AUROC of 0.7501 in detecting RHF, demonstrating strong population-level distinction ability. We further evaluated the model on high-risk clinical subgroups, achieving AUROC values of 0.8194 on a test set of 74 patients with chronic kidney disease (CKD) and 0.8413 on a set of 64 patients with valvular heart disease (VHD). These results highlight the model's potential utility in predicting RHF among clinically elevated-risk populations. In conclusion, this study presents a self-supervised representation learning approach combining spirogram time series and demographic data, demonstrating promising potential for early RHF detection in clinical practice.
LGOct 13, 2025
Reconstructing 12-Lead ECG from 3-Lead ECG using Variational Autoencoder to Improve Cardiac Disease Detection of Wearable ECG DevicesXinyan Guan, Yongfan Lai, Jiarui Jin et al.
Twelve-lead electrocardiograms (ECGs) are the clinical gold standard for cardiac diagnosis, providing comprehensive spatial coverage of the heart necessary to detect conditions such as myocardial infarction (MI). However, their lack of portability limits continuous and large-scale use. Three-lead ECG systems are widely used in wearable devices due to their simplicity and mobility, but they often fail to capture pathologies in unmeasured regions. To address this, we propose WearECG, a Variational Autoencoder (VAE) method that reconstructs twelve-lead ECGs from three leads: II, V1, and V5. Our model includes architectural improvements to better capture temporal and spatial dependencies in ECG signals. We evaluate generation quality using MSE, MAE, and Frechet Inception Distance (FID), and assess clinical validity via a Turing test with expert cardiologists. To further validate diagnostic utility, we fine-tune ECGFounder, a large-scale pretrained ECG model, on a multi-label classification task involving over 40 cardiac conditions, including six different myocardial infarction locations, using both real and generated signals. Experiments on the MIMIC dataset show that our method produces physiologically realistic and diagnostically informative signals, with robust performance in downstream tasks. This work demonstrates the potential of generative modeling for ECG reconstruction and its implications for scalable, low-cost cardiac screening.
LGAug 6, 2025
Masked Training for Robust Arrhythmia Detection from Digitalized Multiple Layout ECG ImagesShanwei Zhang, Deyun Zhang, Yirao Tao et al.
Electrocardiogram (ECG) as an important tool for diagnosing cardiovascular diseases such as arrhythmia. Due to the differences in ECG layouts used by different hospitals, the digitized signals exhibit asynchronous lead time and partial blackout loss, which poses a serious challenge to existing models. To address this challenge, the study introduced PatchECG, a framework for adaptive variable block count missing representation learning based on a masking training strategy, which automatically focuses on key patches with collaborative dependencies between leads, thereby achieving key recognition of arrhythmia in ECGs with different layouts. Experiments were conducted on the PTB-XL dataset and 21388 asynchronous ECG images generated using ECG image kit tool, using the 23 Subclasses as labels. The proposed method demonstrated strong robustness under different layouts, with average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.835 and remained stable (unchanged with layout changes). In external validation based on 400 real ECG images data from Chaoyang Hospital, the AUROC for atrial fibrillation diagnosis reached 0.778; On 12 x 1 layout ECGs, AUROC reaches 0.893. This result is superior to various classic interpolation and baseline methods, and compared to the current optimal large-scale pre-training model ECGFounder, it has improved by 0.111 and 0.19.
LGAug 1, 2025
ECGTwin: Personalized ECG Generation Using Controllable Diffusion ModelYongfan Lai, Bo Liu, Xinyan Guan et al.
Personalized electrocardiogram (ECG) generation is to simulate a patient's ECG digital twins tailored to specific conditions. It has the potential to transform traditional healthcare into a more accurate individualized paradigm, while preserving the key benefits of conventional population-level ECG synthesis. However, this promising task presents two fundamental challenges: extracting individual features without ground truth and injecting various types of conditions without confusing generative model. In this paper, we present ECGTwin, a two-stage framework designed to address these challenges. In the first stage, an Individual Base Extractor trained via contrastive learning robustly captures personal features from a reference ECG. In the second stage, the extracted individual features, along with a target cardiac condition, are integrated into the diffusion-based generation process through our novel AdaX Condition Injector, which injects these signals via two dedicated and specialized pathways. Both qualitative and quantitative experiments have demonstrated that our model can not only generate ECG signals of high fidelity and diversity by offering a fine-grained generation controllability, but also preserving individual-specific features. Furthermore, ECGTwin shows the potential to enhance ECG auto-diagnosis in downstream application, confirming the possibility of precise personalized healthcare solutions.
SPJul 21, 2025
MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and InterpretationsDeyun Zhang, Xiang Lan, Shijia Geng et al.
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders. While machine learning has achieved expert-level performance in ECG interpretation, the development of clinically deployable multimodal AI systems remains constrained, primarily due to the lack of publicly available datasets that simultaneously incorporate raw signals, diagnostic images, and interpretation text. Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings. To address this gap, we introduce MEETI (MIMIC-IV-Ext ECG-Text-Image), the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models. In addition, MEETI includes beat-level quantitative ECG parameters extracted from each lead, offering structured parameters that support fine-grained analysis and model interpretability. Each MEETI record is aligned across four components: (1) the raw ECG waveform, (2) the corresponding plotted image, (3) extracted feature parameters, and (4) detailed interpretation text. This alignment is achieved using consistent, unique identifiers. This unified structure supports transformer-based multimodal learning and supports fine-grained, interpretable reasoning about cardiac health. By bridging the gap between traditional signal analysis, image-based interpretation, and language-driven understanding, MEETI established a robust foundation for the next generation of explainable, multimodal cardiovascular AI. It offers the research community a comprehensive benchmark for developing and evaluating ECG-based AI systems.
LGMay 6, 2024
Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time SeriesShuhao Mei, Xin Li, Yuxi Zhou et al.
Chronic Obstructive Pulmonary Disease (COPD) is a chronic lung condition characterized by airflow obstruction. Current diagnostic methods primarily rely on identifying prominent features in spirometry (Volume-Flow time series) to detect COPD, but they are not adept at predicting future COPD risk based on subtle data patterns. In this study, we introduce a novel deep learning-based approach, DeepSpiro, aimed at the early prediction of future COPD risk. DeepSpiro consists of four key components: SpiroSmoother for stabilizing the Volume-Flow curve, SpiroEncoder for capturing volume variability-pattern through key patches of varying lengths, SpiroExplainer for integrating heterogeneous data and explaining predictions through volume attention, and SpiroPredictor for predicting the disease risk of undiagnosed high-risk patients based on key patch concavity, with prediction horizons of 1, 2, 3, 4, 5 years, or even longer. Evaluated on the UK Biobank dataset, DeepSpiro achieved an AUC of 0.8328 for COPD detection and demonstrated strong predictive performance for future COPD risk (p-value < 0.001). In summary, DeepSpiro can effectively predicts the long-term progression of the COPD disease.