Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers
This addresses the costly annotation issue in medical EEG analysis for tasks like epileptic seizure forecasting, though it is incremental as it adapts existing self-supervised methods to a specific domain.
The paper tackles the problem of scarce labeled data in EEG classification by designing pre-training datasets from unlabeled EEG data, resulting in models that reduce fine-tuning time by over 50% and improve accuracy from 90.93% to 92.16% with AUC increasing from 0.9648 to 0.9702.
Transformer neural networks require a large amount of labeled data to train effectively. Such data is often scarce in electroencephalography, as annotations made by medical experts are costly. This is why self-supervised training, using unlabeled data, has to be performed beforehand. In this paper, we present a way to design several labeled datasets from unlabeled electroencephalogram (EEG) data. These can then be used to pre-train transformers to learn representations of EEG signals. We tested this method on an epileptic seizure forecasting task on the Temple University Seizure Detection Corpus using a Multi-channel Vision Transformer. Our results suggest that 1) Models pre-trained using our approach demonstrate significantly faster training times, reducing fine-tuning duration by more than 50% for the specific task, and 2) Pre-trained models exhibit improved accuracy, with an increase from 90.93% to 92.16%, as well as a higher AUC, rising from 0.9648 to 0.9702 when compared to non-pre-trained models.