QMCVSPAPMLOct 21, 2019

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

arXiv:1910.09570v141 citations
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for large-scale ECG data in medical AI for arrhythmia analysis, though it is incremental as it builds on existing representation learning methods.

The authors released Icentia11K, the largest public ECG dataset with 11,000 patients and 2 billion labeled beats, to enable semi-supervised models and discover unknown arrhythmia subtypes, showing potential through qualitative clustering in PCA embeddings.

We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes