AI CLMay 24, 2025

Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

William Han, Chaojing Duan, Zhepeng Cen, Yihang Yao, Xiaoyu Song, Atharva Mhaskar, Dylan Leong, Michael A. Rosenberg, Emerson Liu, Ding Zhao

CMU

arXiv:2505.18847v17.82 citationsh-index: 11Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a fundamental design choice for developing ELMs, which aim to improve ECG interpretation for medical applications, though it is incremental as it compares existing representation types without introducing a new method.

The study tackled the problem of identifying the most effective input representation for Electrocardiogram-Language Models (ELMs) by benchmarking raw signals, images, and symbolic sequences across 6 datasets and 5 metrics, finding that symbolic representations achieved the greatest number of statistically significant wins.

Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

View on arXiv PDF Code

Similar