CLSPDec 18, 2024

ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

CMU
arXiv:2412.14373v36 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem in medical AI for ECG analysis, offering an incremental improvement in efficiency and interpretability for text generation from ECG signals.

The paper tackled the inefficiency and interpretability challenges of existing two-stage methods for generating text from ECG signals by proposing ECG-Byte, a tokenizer pipeline that enables direct end-to-end training, achieving competitive performance while training 3 times faster and using 48% less data.

Large Language Models (LLMs) have demonstrated exceptional versatility across domains, including applications to electrocardiograms (ECGs). A growing body of work focuses on generating text from multi-channeled ECG signals and corresponding textual prompts. Existing approaches often involve a two-stage process: pretraining an ECG-specific encoder with a self-supervised learning (SSL) objective, followed by finetuning an LLM for natural language generation (NLG) using encoder-derived features. However, these methods face two key limitations: inefficiency due to multi-stage training and challenges in interpreting encoder-generated features. To overcome these issues, we propose ECG-Byte, an adapted byte pair encoding (BPE) tokenizer pipeline for autoregressive language modeling of ECGs. ECG-Byte compresses and encodes ECG signals into tokens, enabling direct end-to-end LLM training by combining ECG and text tokens. This approach enhances interpretability, as ECG tokens can be directly mapped back to the original signals. Leveraging ECG-Byte, we achieve competitive NLG performance while training 3 times faster and using just 48\% of the data required by traditional two-stage methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes