LG AIMay 27

A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models

Aditya Kommineni, Emily Zhou, Kleanthis Avramidis, Tiantian Feng, Shrikanth Narayanan

arXiv:2605.2856363.9

AI Analysis

For researchers and practitioners in neurotechnology and clinical applications, this work highlights the need for realistic evaluation protocols to assess EEG foundation models, revealing their limited robustness in short-window and channel-constrained settings.

The paper proposes a multi-dimensional evaluation framework for EEG foundation models under realistic low-resource conditions, finding that foundation models outperform supervised models on long-context tasks (e.g., sleep stage prediction) but not on short-window BCI tasks, where supervised models with fewer parameters achieve comparable performance.

Evaluating foundation models under appropriate adaptation settings is essential for understanding the quality and transferability of the learned representations. Recent EEG foundation models have demonstrated promising transfer capabilities across tasks and datasets, motivating their growing use in neurotechnology and clinical applications. However, these models are typically evaluated under full fine-tuning on well-curated downstream datasets, a setting that does not reflect biomedical domain constraints such as limited labeled data, reduced sensor coverage, or parameter-efficient adaptation. In this work, we propose a multi-dimensional evaluation framework for assessing EEG models under realistic low-resource conditions. Empirical analysis of both supervised EEG models and recent EEG foundation models, including LaBraM, CSBrain, and CBraMod, across 6 different datasets is performed under the proposed multi-dimensional evaluation framework. We find that EEG foundation models consistently provide performance gains on long-context tasks such as sleep stage prediction and mental health state classification. In contrast, for short-window Brain Computer Interface style tasks, supervised models achieve comparable despite having substantially fewer parameters. Additional analyses demonstrate that current foundation models provide limited robustness to short-window tasks and channel constrained settings. Together, these findings motivate the use of multi-dimensional evaluation protocols that characterize model behavior under realistic use constraints.

View on arXiv PDF

Similar