Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs
This work addresses ECG analysis in under-resourced settings by enabling effective use of arbitrary lead inputs, though it is incremental as it builds on existing multimodal learning approaches.
The paper tackled the problem of suboptimal alignment in multimodal ECG representation learning due to medical language complexity and reliance on full 12-lead setups by proposing K-MERL, which uses large language models and dynamic lead masking to handle arbitrary lead inputs, achieving state-of-the-art performance with a 16% AUC improvement in partial-lead zero-shot classification.
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.