Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
This provides a more ecologically valid method for investigating brain semantic organization in cognitive neuroscience, overcoming limitations of traditional manual annotation.
The researchers tackled the challenge of modeling brain semantic representation by using multimodal large language models (LLMs) as proxies to extract semantic information from naturalistic images via Visual Question Answering, successfully predicting fMRI neural activity patterns (e.g., faces, buildings) and revealing hierarchical semantic organization across cortical regions.
Traditional psychological experiments utilizing naturalistic stimuli face challenges in manual annotation and ecological validity. To address this, we introduce a novel paradigm leveraging multimodal large language models (LLMs) as proxies to extract rich semantic information from naturalistic images through a Visual Question Answering (VQA) strategy for analyzing human visual semantic representation. LLM-derived representations successfully predict established neural activity patterns measured by fMRI (e.g., faces, buildings), validating its feasibility and revealing hierarchical semantic organization across cortical regions. A brain semantic network constructed from LLM-derived representations identifies meaningful clusters reflecting functional and contextual associations. This innovative methodology offers a powerful solution for investigating brain semantic organization with naturalistic stimuli, overcoming limitations of traditional annotation methods and paving the way for more ecologically valid explorations of human cognition.