CLMay 7Code
PulseLM: A Foundation Dataset and Benchmark for PPG-Text LearningHung Manh Pham, Jinyang Wu, Xiao Ma et al.
Photoplethysmography (PPG) is a widely used non-invasive sensing modality for continuous cardiovascular and physiological monitoring across clinical, laboratory, and wearable settings. While existing PPG datasets support a broad range of downstream tasks, they typically provide supervision in the form of numerical measurements or task-specific labels, limiting their compatibility with language-based interfaces and multimodal foundation models. In this work, we introduce PulseLM, a large-scale PPG-text question-answering dataset that bridges raw PPG waveforms and natural language through a unified question-answering (QA) formulation. PulseLM aggregates PPG recordings from sixteen publicly available sources and harmonizes heterogeneous annotations into 12 downstream tasks. The dataset comprises over 1 million standardized 10-second PPG segments, associated with nearly 2.5 million question-answer pairs. We further define reproducible data pipeline, training, and evaluation protocols and establish baseline benchmarks using multimodal PPG-aware large language models. PulseLM provides a standardized foundation for studying language-grounded physiological inference, cross-dataset generalization, and scalable benchmarking of PPG-based multimodal models. We publicly release the dataset and code at https://huggingface.co/datasets/Manhph2211/PulseLM and https://github.com/manhph2211/PULSE-LM, respectively.
LGApr 2
Learning ECG Image Representations via Dual Physiological-Aware AlignmentsHung Manh Pham, Jialu Tang, Aaqib Saeed et al.
Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.
LGOct 24, 2025
Interpretable Multimodal Zero-Shot ECG Diagnosis via Structured Clinical Knowledge AlignmentJialu Tang, Hung Manh Pham, Ignace De Lathauwer et al.
Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive and negative clinical observations, which are curated through an LLM-assisted, expert-validated process, thereby mimicking differential diagnosis. Our approach leverages a pre-trained multimodal model to align ECG and text embeddings without disease-specific fine-tuning. Empirical evaluations demonstrate ZETA's competitive zero-shot classification performance and, importantly, provide qualitative and quantitative evidence of enhanced interpretability, grounding predictions in specific, clinically relevant positive and negative diagnostic features. ZETA underscores the potential of aligning ECG analysis with structured clinical knowledge for building more transparent, generalizable, and trustworthy AI diagnostic systems. We will release the curated observation dataset and code to facilitate future research.