LGAISPTOSep 12, 2025

Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms

arXiv:2509.10369v1h-index: 13
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing clinically fair and generalizable foundation models for electrocardiogram analysis, which is incremental as it builds on existing contrastive learning methods by investigating cohort composition effects.

The study tackled the problem of how data distribution affects the performance and generalizability of contrastive learning-based foundation models for electrocardiograms, finding that diverse pretraining cohorts improve in-distribution accuracy but reduce out-of-distribution generalization, and proposed a strategy to enhance robustness.

Contrastive learning is a widely adopted self-supervised pretraining strategy, yet its dependence on cohort composition remains underexplored. We present Contrasting by Patient Augmented Electrocardiograms (CAPE) foundation model and pretrain on four cohorts (n = 5,203,352), from diverse populations across three continents (North America, South America, Asia). We systematically assess how cohort demographics, health status, and population diversity influence the downstream performance for prediction tasks also including two additional cohorts from another continent (Europe). We find that downstream performance depends on the distributional properties of the pretraining cohort, including demographics and health status. Moreover, while pretraining with a multi-centre, demographically diverse cohort improves in-distribution accuracy, it reduces out-of-distribution (OOD) generalisation of our contrastive approach by encoding cohort-specific artifacts. To address this, we propose the In-Distribution Batch (IDB) strategy, which preserves intra-cohort consistency during pretraining and enhances OOD robustness. This work provides important insights for developing clinically fair and generalisable foundation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes