SPAug 18, 2022
RRWaveNet: A Compact End-to-End Multi-Scale Residual CNN for Robust PPG Respiratory Rate EstimationPongpanut Osathitporn, Guntitat Sawadwuthikul, Punnawish Thuwajit et al.
Respiratory rate (RR) is an important biomarker as RR changes can reflect severe medical events such as heart disease, lung disease, and sleep disorders. Unfortunately, standard manual RR counting is prone to human error and cannot be performed continuously. This study proposes a method for continuously estimating RR, RRWaveNet. The method is a compact end-to-end deep learning model which does not require feature engineering and can use low-cost raw photoplethysmography (PPG) as input signal. RRWaveNet was tested subject-independently and compared to baseline in four datasets (BIDMC, CapnoBase, WESAD, and SensAI) and using three window sizes (16, 32, and 64 seconds). RRWaveNet outperformed current state-of-the-art methods with mean absolute errors at optimal window size of 1.66 \pm 1.01, 1.59 \pm 1.08, 1.92 \pm 0.96 and 1.23 \pm 0.61 breaths per minute for each dataset. In remote monitoring settings, such as in the WESAD and SensAI datasets, we apply transfer learning to improve the performance using two other ICU datasets as pretraining datasets, reducing the MAE by up to 21$\%$. This shows that this model allows accurate and practical estimation of RR on affordable and wearable devices. Our study also shows feasibility of remote RR monitoring in the context of telemedicine and at home.
LGDec 10, 2025
Stanford Sleep Bench: Evaluating Polysomnography Pre-training Methods for Sleep Foundation ModelsMagnus Ruud Kjaer, Rahul Thapa, Gauri Ganjoo et al.
Polysomnography (PSG), the gold standard test for sleep analysis, generates vast amounts of multimodal clinical data, presenting an opportunity to leverage self-supervised representation learning (SSRL) for pre-training foundation models to enhance sleep analysis. However, progress in sleep foundation models is hindered by two key limitations: (1) the lack of a shared dataset and benchmark with diverse tasks for training and evaluation, and (2) the absence of a systematic evaluation of SSRL approaches across sleep-related tasks. To address these gaps, we introduce Stanford Sleep Bench, a large-scale PSG dataset comprising 17,467 recordings totaling over 163,000 hours from a major sleep clinic, including 13 clinical disease prediction tasks alongside canonical sleep-related tasks such as sleep staging, apnea diagnosis, and age estimation. We systematically evaluate SSRL pre-training methods on Stanford Sleep Bench, assessing downstream performance across four tasks: sleep staging, apnea diagnosis, age estimation, and disease and mortality prediction. Our results show that multiple pretraining methods achieve comparable performance for sleep staging, apnea diagnosis, and age estimation. However, for mortality and disease prediction, contrastive learning significantly outperforms other approaches while also converging faster during pretraining. To facilitate reproducibility and advance sleep research, we will release Stanford Sleep Bench along with pretrained model weights, training pipelines, and evaluation code.
LGMay 4
Pretraining on Sleep Data Improves non-Sleep Biosignal TasksWilliam Lehn-Schiøler, Magnus Ruud Kjær, Phillip Hempel et al.
Sleep foundation models have recently demonstrated strong performance on in-domain polysomnography tasks, including sleep staging, apnea detection, and disease risk prediction. In this work, we investigate whether sleep biosignals can serve as an effective pretraining distribution for learning representations that transfer beyond sleep to adjacent domains. Following sleep foundation models, we perform sleep-only multimodal contrastive pretraining (with a leave-one-out objective) and evaluate transfer to non-sleep EEG and ECG, two well-benchmarked biosignal modalities with heterogeneous datasets and clinically meaningful downstream tasks. Across eight downstream tasks spanning multiple EEG and ECG datasets, sleep pretraining consistently improves performance relative to training from scratch. Moreover, on several tasks, we achieve performance competitive with or surpassing prior specialized state-of-the-art and foundation models.
SPFeb 17, 2025
Frequency-Aware Masked Autoencoders for Human Activity Recognition using AccelerometersNiels R. Lorenzen, Poul J. Jennum, Emmanuel Mignot et al.
Wearable accelerometers are widely used for continuous monitoring of physical activity. Supervised machine learning and deep learning algorithms have long been used to extract meaningful activity information from raw accelerometry data, but progress has been hampered by the limited amount of labeled data that is publicly available. Exploiting large unlabeled datasets using self-supervised pretraining is a relatively new and underexplored approach in the field of human activity recognition (HAR). We used a time-series transformer masked autoencoder (MAE) approach to self-supervised pretraining and propose two novel spectrogram-based loss functions: the log-scale meanmagnitude (LMM) and log-scale magnitude variance (LMV) losses. We compared these losses with the mean squared error (MSE) loss for MAE training. We leveraged the large unlabeled UK Biobank accelerometry dataset (n = 109k) for pretraining and evaluated downstream HAR performance using a linear classifier in a smaller labelled dataset. We found that pretraining with the LMM loss improved performance compared to an MAE pretrained with the MSE loss, with 12.7% increase in subject-wise F1 score when using linear probing. Compared with a state-of-the-art ResNet-based HAR model, our LMM-pretrained transformer models performed better (+9.8% F1) with linear probing and comparably when fine-tuned using an LSTM classifier. The addition of the LMV to the LMM loss decreased performance compared to the LMM loss alone. These findings establish the LMM loss as a robust and effective method for pretraining MAE models on accelerometer data for HAR and show the potential of pretraining sequence-based models for free-living HAR.
CVJan 7, 2021
MSED: a multi-modal sleep event detection model for clinical sleep analysisAlexander Neergaard Olesen, Poul Jennum, Emmanuel Mignot et al.
Study objective: Clinical sleep analysis require manual analysis of sleep patterns for correct diagnosis of sleep disorders. Several studies show significant variability in scoring discrete sleep events. We wished to investigate, whether an automatic method could be used for detection of arousals (Ar), leg movements (LM) and sleep disordered breathing (SDB) events, and if the joint detection of these events performed better than having three separate models. Methods: We designed a single deep neural network architecture to jointly detect sleep events in a polysomnogram. We trained the model on 1653 recordings of individuals, and tested the optimized model on 1000 separate recordings. The performance of the model was quantified by F1, precision, and recall scores, and by correlating index values to clinical values using Pearson's correlation coefficient. Results: F1 scores for the optimized model was 0.70, 0.63, and 0.62 for Ar, LM, and SDB, respectively. The performance was higher, when detecting events jointly compared to corresponding single-event models. Index values computed from detected events correlated well with manual annotations ($r^2$ = 0.73, $r^2$ = 0.77, $r^2$ = 0.78, respectively). Conclusion: Detecting arousals, leg movements and sleep disordered breathing events jointly is possible, and the computed index values correlates well with human annotations.
CVAug 21, 2020
Automatic sleep stage classification with deep residual networks in a mixed-cohort settingAlexander Neergaard Olesen, Poul Jennum, Emmanuel Mignot et al.
Study Objectives: Sleep stage scoring is performed manually by sleep experts and is prone to subjective interpretation of scoring rules with low intra- and interscorer reliability. Many automatic systems rely on few small-scale databases for developing models, and generalizability to new datasets is thus unknown. We investigated a novel deep neural network to assess the generalizability of several large-scale cohorts. Methods: A deep neural network model was developed using 15684 polysomnography studies from five different cohorts. We applied four different scenarios: 1) impact of varying time-scales in the model; 2) performance of a single cohort on other cohorts of smaller, greater or equal size relative to the performance of other cohorts on a single cohort; 3) varying the fraction of mixed-cohort training data compared to using single-origin data; and 4) comparing models trained on combinations of data from 2, 3, and 4 cohorts. Results: Overall classification accuracy improved with increasing fractions of training data (0.25$\%$: 0.782 $\pm$ 0.097, 95$\%$ CI [0.777-0.787]; 100$\%$: 0.869 $\pm$ 0.064, 95$\%$ CI [0.864-0.872]), and with increasing number of data sources (2: 0.788 $\pm$ 0.102, 95$\%$ CI [0.787-0.790]; 3: 0.808 $\pm$ 0.092, 95$\%$ CI [0.807-0.810]; 4: 0.821 $\pm$ 0.085, 95$\%$ CI [0.819-0.823]). Different cohorts show varying levels of generalization to other cohorts. Conclusions: Automatic sleep stage scoring systems based on deep learning algorithms should consider as much data as possible from as many sources available to ensure proper generalization. Public datasets for benchmarking should be made available for future research.
CVApr 10, 2020
Deep transfer learning for improving single-EEG arousal detectionAlexander Neergaard Olesen, Poul Jennum, Emmanuel Mignot et al.
Datasets in sleep science present challenges for machine learning algorithms due to differences in recording setups across clinics. We investigate two deep transfer learning strategies for overcoming the channel mismatch problem for cases where two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models. Specifically, we train a baseline model on multivariate polysomnography data and subsequently replace the first two layers to prepare the architecture for single-channel electroencephalography data. Using a fine-tuning strategy, our model yields similar performance to the baseline model (F1=0.682 and F1=0.694, respectively), and was significantly better than a comparable single-channel model. Our results are promising for researchers working with small databases who wish to use deep learning models pre-trained on larger databases.
SPMay 16, 2019
Towards a Flexible Deep Learning Method for Automatic Detection of Clinically Relevant Multi-Modal Events in the PolysomnogramAlexander Neergaard Olesen, Stanislas Chambon, Valentin Thorey et al.
Much attention has been given to automatic sleep staging algorithms in past years, but the detection of discrete events in sleep studies is also crucial for precise characterization of sleep patterns and possible diagnosis of sleep disorders. We propose here a deep learning model for automatic detection and annotation of arousals and leg movements. Both of these are commonly seen during normal sleep, while an excessive amount of either is linked to disrupted sleep patterns, excessive daytime sleepiness impacting quality of life, and various sleep disorders. Our model was trained on 1,485 subjects and tested on 1,000 separate recordings of sleep. We tested two different experimental setups and found optimal arousal detection was attained by including a recurrent neural network module in our default model with a dynamic default event window (F1 = 0.75), while optimal leg movement detection was attained using a static event window (F1 = 0.65). Our work show promise while still allowing for improvements. Specifically, future research will explore the proposed model as a general-purpose sleep analysis model.
NCMar 15, 2019
Automatic Detection of Cortical Arousals in Sleep and their Contribution to Daytime SleepinessAndreas Brink-Kjaer, Alexander Neergaard Olesen, Paul E. Peppard et al.
Cortical arousals are transient events of disturbed sleep that occur spontaneously or in response to stimuli such as apneic events. The gold standard for arousal detection in human polysomnographic recordings (PSGs) is manual annotation by expert human scorers, a method with significant interscorer variability. In this study, we developed an automated method, the Multimodal Arousal Detector (MAD), to detect arousals using deep learning methods. The MAD was trained on 2,889 PSGs to detect both cortical arousals and wakefulness in 1 second intervals. Furthermore, the relationship between MAD-predicted labels on PSGs and next day mean sleep latency (MSL) on a multiple sleep latency test (MSLT), a reflection of daytime sleepiness, was analyzed in 1447 MSLT instances in 873 subjects. In a dataset of 1,026 PSGs, the MAD achieved a F1 score of 0.76 for arousal detection, while wakefulness was predicted with an accuracy of 0.95. In 60 PSGs scored by multiple human expert technicians, the MAD significantly outperformed the average human scorer for arousal detection with a difference in F1 score of 0.09. After controlling for other known covariates, a doubling of the arousal index was associated with an average decrease in MSL of 40 seconds ($β$ = -0.67, p = 0.0075). The MAD outperformed the average human expert and the MAD-predicted arousals were shown to be significant predictors of MSL, which demonstrate clinical validity the MAD.
SPDec 7, 2018
DOSED: a deep learning approach to detect multiple sleep micro-events in EEG signalStanislas Chambon, Valentin Thorey, Pierrick J. Arnal et al.
Background: Electroencephalography (EEG) monitors brain activity during sleep and is used to identify sleep disorders. In sleep medicine, clinicians interpret raw EEG signals in so-called sleep stages, which are assigned by experts to every 30s window of signal. For diagnosis, they also rely on shorter prototypical micro-architecture events which exhibit variable durations and shapes, such as spindles, K-complexes or arousals. Annotating such events is traditionally performed by a trained sleep expert, making the process time consuming, tedious and subject to inter-scorer variability. To automate this procedure, various methods have been developed, yet these are event-specific and rely on the extraction of hand-crafted features. New method: We propose a novel deep learning architecure called Dreem One Shot Event Detector (DOSED). DOSED jointly predicts locations, durations and types of events in EEG time series. The proposed approach, applied here on sleep related micro-architecture events, is inspired by object detectors developed for computer vision such as YOLO and SSD. It relies on a convolutional neural network that builds a feature representation from raw EEG signals, as well as two modules performing localization and classification respectively. Results and comparison with other methods: The proposed approach is tested on 4 datasets and 3 types of events (spindles, K-complexes, arousals) and compared to the current state-of-the-art detection algorithms. Conclusions: Results demonstrate the versatility of this new approach and improved performance compared to the current state-of-the-art detection methods.
CVOct 8, 2018
Deep residual networks for automatic sleep stage classification of raw polysomnographic waveformsAlexander Neergaard Olesen, Poul Jennum, Paul Peppard et al.
We have developed an automatic sleep stage classification algorithm based on deep residual neural networks and raw polysomnogram signals. Briefly, the raw data is passed through 50 convolutional layers before subsequent classification into one of five sleep stages. Three model configurations were trained on 1850 polysomnogram recordings and subsequently tested on 230 independent recordings. Our best performing model yielded an accuracy of 84.1% and a Cohen's kappa of 0.746, improving on previous reported results by other groups also using only raw polysomnogram data. Most errors were made on non-REM stage 1 and 3 decisions, errors likely resulting from the definition of these stages. Further testing on independent cohorts is needed to verify performance for clinical use.
SPJul 11, 2018
A deep learning architecture to detect events in EEG signals during sleepStanislas Chambon, Valentin Thorey, Pierrick J. Arnal et al.
Electroencephalography (EEG) during sleep is used by clinicians to evaluate various neurological disorders. In sleep medicine, it is relevant to detect macro-events (> 10s) such as sleep stages, and micro-events (<2s) such as spindles and K-complexes. Annotations of such events require a trained sleep expert, a time consuming and tedious process with a large inter-scorer variability. Automatic algorithms have been developed to detect various types of events but these are event-specific. We propose a deep learning method that jointly predicts locations, durations and types of events in EEG time series. It relies on a convolutional neural network that builds a feature representation from raw EEG signals. Numerical experiments demonstrate efficiency of this new approach on various event detection tasks compared to current state-of-the-art, event specific, algorithms.
NEOct 5, 2017
Neural network an1alysis of sleep stages enables efficient diagnosis of narcolepsyJens B. Stephansen, Alexander N. Olesen, Mads Olsen et al.
Analysis of sleep for the diagnosis of sleep disorders such as Type-1 Narcolepsy (T1N) currently requires visual inspection of polysomnography records by trained scoring technicians. Here, we used neural networks in approximately 3,000 normal and abnormal sleep recordings to automate sleep stage scoring, producing a hypnodensity graph - a probability distribution conveying more information than classical hypnograms. Accuracy of sleep stage scoring was validated in 70 subjects assessed by six scorers. The best model performed better than any individual scorer (87% versus consensus). It also reliably scores sleep down to 5 instead of 30 second scoring epochs. A T1N marker based on unusual sleep-stage overlaps achieved a specificity of 96% and a sensitivity of 91%, validated in independent datasets. Addition of HLA-DQB1*06:02 typing increased specificity to 99%. Our method can reduce time spent in sleep clinics and automates T1N diagnosis. It also opens the possibility of diagnosing T1N using home sleep studies.