Improving Myocardial Infarction Detection via Synthetic ECG Pretraining
This work addresses the challenge of scarce clinical data for automated ECG interpretation in myocardial infarction diagnosis, offering an incremental improvement through synthetic data pretraining.
The paper tackled the problem of limited labeled data for deep learning models in myocardial infarction detection from ECGs by proposing a physiology-aware pipeline to synthesize realistic 12-lead ECGs and pre-train classifiers, resulting in AUC gains of up to 4 percentage points, especially in low-data settings.
Myocardial infarction is a major cause of death globally, and accurate early diagnosis from electrocardiograms (ECGs) remains a clinical priority. Deep learning models have shown promise for automated ECG interpretation, but require large amounts of labeled data, which are often scarce in practice. We propose a physiology-aware pipeline that (i) synthesizes 12-lead ECGs with tunable MI morphology and realistic noise, and (ii) pre-trains recurrent and transformer classifiers with self-supervised masked-autoencoding plus a joint reconstruction-classification objective. We validate the realism of synthetic ECGs via statistical and visual analysis, confirming that key morphological features are preserved. Pretraining on synthetic data consistently improved classification performance, particularly in low-data settings, with AUC gains of up to 4 percentage points. These results show that controlled synthetic ECGs can help improve MI detection when real clinical data is limited.