NC CV LGJul 23, 2025

Multimodal Recurrent Ensembles for Predicting Brain Responses to Naturalistic Movies (Algonauts 2025)

Semih Eren, Deniz Kucukahmetler, Nico Scherf

arXiv:2507.17897v45.95 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the challenge of modeling distributed cortical responses for neuroscience research, but it is incremental as it builds on existing multimodal and recurrent methods for brain encoding.

The authors tackled the problem of predicting brain responses to naturalistic movies by developing a hierarchical multimodal recurrent ensemble that integrates visual, auditory, and semantic information, achieving an overall Pearson r = 0.2094 and a peak single-parcel score of mean r = 0.63, ranking third in the Algonauts 2025 challenge.

Accurately predicting distributed cortical responses to naturalistic stimuli requires models that integrate visual, auditory and semantic information over time. We present a hierarchical multimodal recurrent ensemble that maps pretrained video, audio, and language embeddings to fMRI time series recorded while four subjects watched almost 80 hours of movies provided by the Algonauts 2025 challenge. Modality-specific bidirectional RNNs encode temporal dynamics; their hidden states are fused and passed to a second recurrent layer, and lightweight subject-specific heads output responses for 1000 cortical parcels. Training relies on a composite MSE-correlation loss and a curriculum that gradually shifts emphasis from early sensory to late association regions. Averaging 100 model variants further boosts robustness. The resulting system ranked third on the competition leaderboard, achieving an overall Pearson r = 0.2094 and the highest single-parcel peak score (mean r = 0.63) among all participants, with particularly strong gains for the most challenging subject (Subject 5). The approach establishes a simple, extensible baseline for future multimodal brain-encoding benchmarks.

View on arXiv PDF

Similar