Encoder-Free ECG-Language Models
This work addresses the complexity of automated ECG interpretation for medical AI, though it is incremental as it adapts encoder-free designs from vision-language models to ECG.
The paper tackles the problem of architectural and training complexity in ECG-Language Models (ELMs) by introducing ELF, an encoder-free ELM that uses a single projection layer, which matches or exceeds state-of-the-art ELMs across five datasets while revealing reliance on benchmark artifacts and language priors.
ECG-Language Models (ELMs) extend recent progress in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most ELMs follow Vision-Language Model (VLM) designs and depend on pretrained ECG encoders, adding architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, an encoder-free ELM that replaces the ECG encoder with a single projection layer trained jointly with the LLM. Across five datasets, ELF matches or exceeds state-of-the-art ELMs that use far more complex encoders and training pipelines. We also test whether adding architectural biases to ELF improves performance and find that the single linear projection remains competitive. Finally, we show that ELF, and potentially other ELMs, often rely more on benchmark artifacts and language priors than ECG-derived information, highlighting limitations in current evaluation practices and ELM design. All data and code is available at https://github.com/willxxy/ECG-Bench.