CV NCDec 26, 2025

SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis

Mo Wang, Junfeng Xia, Wenhao Ye, Enyu Liu, Kaining Peng, Jianfeng Feng, Quanying Liu, Hongkai Wen

arXiv:2512.21881v313.16 citationsh-index: 13

Originality Highly original

AI Analysis

This work addresses efficiency challenges in fMRI analysis for neuroscience and medical research, offering a novel method that balances spatial fidelity with reduced computational demands.

The paper tackles the dual bottleneck of data- and training-efficiency in fMRI foundation models by introducing SLIM-Brain, which achieves state-of-the-art performance across seven benchmarks while requiring only 4,000 pre-training sessions and 30% of the GPU memory compared to traditional methods.

Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods.

View on arXiv PDF

Similar