Language learning shapes visual category-selectivity in deep neural networks
This research addresses how language shapes visual representations in AI models, offering insights into brain-like organization, but it is incremental as it builds on existing neural network and fMRI studies.
The study investigated whether artificial neural networks develop category-selective neurons similar to the human brain and how language learning influences these representations. It found that language-supervised models had more but less specific category-selective neurons, with reduced spatial localization and activation strength, indicating a shift toward distributed, semantically aligned coding, replicated in CLIP.
Category-selective regions in the human brain-such as the fusiform face area (FFA), extrastriate body area (EBA), parahippocampal place area (PPA), and visual word form area (VWFA)-support high-level visual recognition. Here, we investigate whether artificial neural networks (ANNs) exhibit analogous category-selective neurons and how these representations are shaped by language experience. Using an fMRI-inspired functional localizer approach, we identified face-, body-, place-, and word-selective neurons in deep networks presented with category images and scrambled controls. Both the purely visual ResNet and a linguistically supervised Lang-Learned ResNet contained category-selective neurons that increased in proportion across layers. However, compared to the vision-only model, the Lang-Learned ResNet showed a greater number but lower specificity of category-selective neurons, along with reduced spatial localization and attenuated activation strength-indicating a shift toward more distributed, semantically aligned coding. These effects were replicated in the large-scale vision-language model CLIP. Together, our findings reveal that language experience systematically reorganizes visual category representations in ANNs, providing a computational parallel to how linguistic context may shape categorical organization in the human brain.