LGQMFeb 10

Fully-automated sleep staging: multicenter validation of a generalizable deep neural network for Parkinson's disease and isolated REM sleep behavior disorder

arXiv:2602.09793v2h-index: 31
Originality Synthesis-oriented
AI Analysis

This work addresses a bottleneck in deploying sleep disorder screening technologies at scale for patients with neurodegenerative conditions, representing an incremental improvement with domain-specific impact.

The researchers tackled the challenge of manual sleep staging in neurodegenerative diseases like Parkinson's disease and isolated REM sleep behavior disorder by adapting a deep neural network, achieving a mean Cohen's κ of 0.74 after fine-tuning and improving REM sleep staging accuracy to 95.5% with confidence thresholds.

Isolated REM sleep behavior disorder (iRBD) is a key prodromal marker of Parkinson's disease (PD), and video-polysomnography (vPSG) remains the diagnostic gold standard. However, manual sleep staging is particularly challenging in neurodegenerative diseases due to EEG abnormalities and fragmented sleep, making PSG assessments a bottleneck for deploying new RBD screening technologies at scale. We adapted U-Sleep, a deep neural network, for generalizable sleep staging in PD and iRBD. A pretrained U-Sleep model, based on a large, multisite non-neurodegenerative dataset (PUB; 19,236 PSGs across 12 sites), was fine-tuned on research datasets from two centers (Lundbeck Foundation Parkinson's Disease Research Center (PACE) and the Cologne-Bonn Cohort (CBC); 112 PD, 138 iRBD, 89 age-matched controls. The resulting model was evaluated on an independent dataset from the Danish Center for Sleep Medicine (DCSM; 81 PD, 36 iRBD, 87 sleep-clinic controls). A subset of PSGs with low agreement between the human rater and the model (Cohen's $κ$ < 0.6) was re-scored by a second blinded human rater to identify sources of disagreement. Finally, we applied confidence-based thresholds to optimize REM sleep staging. The pretrained model achieved mean $κ$ = 0.81 in PUB, but $κ$ = 0.66 when applied directly to PACE/CBC. By fine-tuning the model, we developed a generalized model with $κ$ = 0.74 on PACE/CBC (p < 0.001 vs. the pretrained model). In DCSM, mean and median $κ$ increased from 0.60 to 0.64 (p < 0.001) and 0.64 to 0.69 (p < 0.001), respectively. In the interrater study, PSGs with low agreement between the model and the initial scorer showed similarly low agreement between human scorers. Applying a confidence threshold increased the proportion of correctly identified REM sleep epochs from 85% to 95.5%, while preserving sufficient (> 5 min) REM sleep for 95% of subjects.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes