NeuralBench: A Unifying Framework to Benchmark NeuroAI Models
This framework addresses the need for standardized evaluation in neuroimaging AI, enabling fair comparisons and identifying bottlenecks for the research community.
NeuralBench provides a unified framework for benchmarking AI models of brain activity, releasing NeuralBench-EEG v1.0 with 36 EEG tasks and 14 architectures across 94 datasets. Key findings show foundation models only marginally outperform task-specific models, and many tasks remain highly challenging.
Deep learning and large public datasets have recently catalyzed the proliferation of AI models for processing brain recordings. However, systematically evaluating these models remains a challenge: not only do the preprocessing pipelines, training and finetuning approaches largely vary across studies, but their downstream evaluation is often limited to small sets of tasks and/or datasets. Here, we present NeuralBench: a unified framework for benchmarking AI models of brain activity. We accompany this framework with NeuralBench-EEG v1.0 -- a large EEG benchmark that includes 36 electroencephalography (EEG) tasks and 14 deep learning architectures, and is evaluated on 94 datasets accessed through a standardized interface. This first EEG-focused release already highlights two main findings. First, current foundation models only marginally outperform task-specific models. Second, a large set of tasks (e.g. cognitive decoding, clinical predictions) remain highly challenging, even for the best models. Critically, NeuralBench is designed for the integration of new tasks, datasets, models, and neuroimaging modalities, as illustrated by preliminary extensions to MEG and fMRI datasets and models. Through this white paper, we invite the community to expand this open-source framework and work together toward a unified benchmarking standard for neuroimaging models.