Weak Supervision for Affordable Modeling of Electrocardiogram Data
This work addresses the cost and effort of data annotation in ECG analysis for medical diagnosis, though it is incremental as it applies existing weak supervision concepts to a new domain.
The paper tackled the problem of expensive manual annotation for ECG heartbeat classification by using multiple weak supervision sources, such as six intuitive time-series heuristics, to generate probabilistic labels for over 100,000 heartbeats and train competitive classifiers without ground truth labels.
Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. We explore the use of multiple weak supervision sources to learn diagnostic models of abnormal heartbeats via human designed heuristics, without using ground truth labels on individual data points. Our work is among the first to define weak supervision sources directly on time series data. Results show that with as few as six intuitive time series heuristics, we are able to infer high quality probabilistic label estimates for over 100,000 heartbeats with little human effort, and use the estimated labels to train competitive classifiers evaluated on held out test data.