QM LG SP MLOct 31, 2019

Dreem Open Datasets: Multi-Scored Sleep Datasets to compare Human and Automated sleep staging

Antoine Guillot, Fabien Sauvet, Emmanuel H During, Valentin Thorey

arXiv:1911.03221v414.5141 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the resource-intensive task of sleep disorder diagnosis by providing datasets and a framework to improve automated sleep staging, potentially aiding clinicians, though it is incremental as it builds on existing methods.

The study tackled the problem of automated sleep stage classification by introducing two multi-scored datasets (DOD-H and DOD-O) and benchmarking existing and new methods against a consensus of human scorers. The result showed that many methods, including their new SimpleSleepNet, achieved human-level performance, with SimpleSleepNet reaching F1 scores of 89.9% vs. 86.8% on healthy volunteers and 88.3% vs. 84.8% on patients with obstructive sleep apnea.

Sleep stage classification constitutes an important element of sleep disorder diagnosis. It relies on the visual inspection of polysomnography records by trained sleep technologists. Automated approaches have been designed to alleviate this resource-intensive task. However, such approaches are usually compared to a single human scorer annotation despite an inter-rater agreement of about 85 % only. The present study introduces two publicly-available datasets, DOD-H including 25 healthy volunteers and DOD-O including 55 patients suffering from obstructive sleep apnea (OSA). Both datasets have been scored by 5 sleep technologists from different sleep centers. We developed a framework to compare automated approaches to a consensus of multiple human scorers. Using this framework, we benchmarked and compared the main literature approaches. We also developed and benchmarked a new deep learning method, SimpleSleepNet, inspired by current state-of-the-art. We demonstrated that many methods can reach human-level performance on both datasets. SimpleSleepNet achieved an F1 of 89.9 % vs 86.8 % on average for human scorers on DOD-H, and an F1 of 88.3 % vs 84.8 % on DOD-O. Our study highlights that using state-of-the-art automated sleep staging outperforms human scorers performance for healthy volunteers and patients suffering from OSA. Consideration could be made to use automated approaches in the clinical setting.

View on arXiv PDF Code

Similar