ECG Classification on PTB-XL: A Data-Centric Approach with Simplified CNN-VAE
This work provides an incremental improvement in ECG classification for medical professionals by demonstrating that simpler models can achieve competitive performance with careful data handling, reducing computational overhead.
This paper tackles automated electrocardiogram (ECG) classification for cardiovascular diseases. By focusing on data preprocessing and class balancing with a simplified CNN-VAE, the authors achieved 87.01% binary accuracy and 0.7454 weighted F1-score on the PTB XL dataset across five diagnostic classes, using only 197,093 trainable parameters.
Automated electrocardiogram (ECG) classification is essential for early detection of cardiovascular diseases. While recent approaches have increasingly relied on deep neural networks with complex architectures, we demonstrate that careful data preprocessing, class balancing, and a simplified convolutional neural network combined with a variational autoencoder (CNN-VAE) architecture can achieve competitive performance with significantly reduced model complexity. Using the publicly available PTB XL dataset, we achieve 87.01% binary accuracy and 0.7454 weighted F1-score across five diagnostic classes (CD, HYP, MI, NORM, STTC) with only 197,093 trainable parameters. Our work emphasises the importance of data-centric machine learning practices over architectural complexity, demonstrating that systematic preprocessing and balanced training strategies are critical for medical signal classification. We identify challenges in minority class detection (particularly hypertrophy) and provide insights for future improvements in handling imbalanced ECG datasets. Index Terms: ECG classification, convolutional neural networks, class balancing, data preprocessing, variational autoencoders, PTB-XL dataset