CVSep 10, 2025

VoxelFormer: Parameter-Efficient Multi-Subject Visual Decoding from fMRI

arXiv:2509.09015v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses scalability and practical deployment issues in fMRI visual decoding for neuroscience and brain-computer interface applications, though it appears incremental as it builds on existing transformer and CLIP-based approaches.

The paper tackles the problem of fMRI-based visual decoding requiring subject-specific training by introducing VoxelFormer, a lightweight transformer architecture that enables multi-subject training, achieving competitive retrieval performance on the 7T Natural Scenes Dataset with significantly fewer parameters than existing methods.

Recent advances in fMRI-based visual decoding have enabled compelling reconstructions of perceived images. However, most approaches rely on subject-specific training, limiting scalability and practical deployment. We introduce \textbf{VoxelFormer}, a lightweight transformer architecture that enables multi-subject training for visual decoding from fMRI. VoxelFormer integrates a Token Merging Transformer (ToMer) for efficient voxel compression and a query-driven Q-Former that produces fixed-size neural representations aligned with the CLIP image embedding space. Evaluated on the 7T Natural Scenes Dataset, VoxelFormer achieves competitive retrieval performance on subjects included during training with significantly fewer parameters than existing methods. These results highlight token merging and query-based transformers as promising strategies for parameter-efficient neural decoding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes