CVApr 20

Subject-Aware Multi-Granularity Alignment for Zero-Shot EEG-to-Image Retrieval

Lin Jiang, Qingshan She, Jiale Xu, Haiqi Xu, Duanpo Wu, Zhenzhong Kuang

arXiv:2604.1778267.3h-index: 19

Predicted impact top 47% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For brain-computer interface and neural decoding researchers, this work improves zero-shot EEG-to-image retrieval by addressing subject variability and multi-scale visual information, though it is an incremental improvement over existing alignment methods.

The paper tackles zero-shot EEG-to-image retrieval by proposing a subject-aware multi-granularity alignment (SAMGA) framework that adaptively aggregates visual representations to handle subject-dependent granularity. The method achieves 91.3% Top-1 and 98.8% Top-5 accuracy in intra-subject setting, and 34.4% Top-1 and 64.8% Top-5 in inter-subject setting, outperforming prior SOTA.

Zero-shot EEG-to-image retrieval aims to decode perceived visual content from electroencephalography (EEG) by aligning neural responses with pretrained visual representations, providing a promising route toward scalable visual neural decoding and practical brain-computer interfaces. However, robust EEG-to-image retrieval remains challenging, because prior methods usually rely on either a single fixed visual target or a subject-invariant target construction scheme. Such designs overlook two important properties of visually evoked EEG signals: they preserve information across multiple representational scales, and the visual granularity best matched to EEG may vary across subjects. To address these issues, subject-aware multi-granularity alignment (SAMGA) framework is proposed for zero-shot EEG-to-image retrieval. SAMGA first constructs a subject-aware visual supervision target by adaptively aggregating multiple intermediate representations from a pretrained vision encoder, allowing the model to absorb subject-dependent granularity deviations during training while preserving subject-agnostic inference. Building on this adaptive target construction, a coarse-to-fine cross-modal alignment strategy is further designed with a shared encoder wherein the coarse stage stabilizes the shared semantic geometry and reduces subject-induced distribution shift, and the fine stage further improves instance-level retrieval discrimination. Extensive experiments on the THINGS-EEG benchmark demonstrate that the proposed method achieves 91.3% Top-1 and 98.8% Top-5 accuracy in the intra-subject setting, and 34.4% Top-1 and 64.8% Top-5 accuracy in the inter-subject setting, outperforming recent state-of-the-art methods.

View on arXiv PDF

Similar