Data-Efficient Sleep Staging with Synthetic Time Series Pretraining
This addresses data efficiency in sleep staging for medical applications, offering an incremental improvement over existing self-supervised methods by using synthetic data.
The paper tackles the challenge of sleep stage classification from EEG data when datasets are small or have few subjects, by proposing frequency pretraining on synthetic time series, which outperforms fully supervised learning in data-limited scenarios and matches it with many subjects.
Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces.