Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)
This work addresses the problem of modeling brain activity from multimodal stimuli for neuroscience researchers, but it is incremental as it builds on existing methods without major breakthroughs.
The paper tackled predicting fMRI brain responses to movie stimuli by integrating multimodal representations from various models and using stacked regression, achieving a 10th-place ranking in the Algonauts 2025 Challenge.
We present our submission to the Algonauts 2025 Challenge, where the goal is to predict fMRI brain responses to movie stimuli. Our approach integrates multimodal representations from large language models, video encoders, audio models, and vision-language models, combining both off-the-shelf and fine-tuned variants. To improve performance, we enhanced textual inputs with detailed transcripts and summaries, and we explored stimulus-tuning and fine-tuning strategies for language and vision models. Predictions from individual models were combined using stacked regression, yielding solid results. Our submission, under the team name Seinfeld, ranked 10th. We make all code and resources publicly available, contributing to ongoing efforts in developing multimodal encoding models for brain activity.