sEEG-based Encoding for Sentence Retrieval: A Contrastive Learning Approach to Brain-Language Alignment
This work addresses the problem of brain-language alignment for neuroscience and AI researchers, but it is incremental as it builds on existing contrastive learning and foundation models.
The paper tackled the challenge of aligning invasive brain recordings with natural language by developing SSENSE, a contrastive learning framework that projects sEEG signals into a sentence embedding space, enabling sentence-level retrieval from brain activity and achieving promising results on a naturalistic movie-watching dataset.
Interpreting neural activity through meaningful latent representations remains a complex and evolving challenge at the intersection of neuroscience and artificial intelligence. We investigate the potential of multimodal foundation models to align invasive brain recordings with natural language. We present SSENSE, a contrastive learning framework that projects single-subject stereo-electroencephalography (sEEG) signals into the sentence embedding space of a frozen CLIP model, enabling sentence-level retrieval directly from brain activity. SSENSE trains a neural encoder on spectral representations of sEEG using InfoNCE loss, without fine-tuning the text encoder. We evaluate our method on time-aligned sEEG and spoken transcripts from a naturalistic movie-watching dataset. Despite limited data, SSENSE achieves promising results, demonstrating that general-purpose language representations can serve as effective priors for neural decoding.