NCCVFeb 23, 2025

Language learning shapes visual category-selectivity in deep neural networks

arXiv:2502.16456v21 citationsh-index: 2
Originality Incremental advance
AI Analysis

This research addresses how language shapes visual representations in AI models, offering insights into brain-like organization, but it is incremental as it builds on existing neural network and fMRI studies.

The study investigated whether artificial neural networks develop category-selective neurons similar to the human brain and how language learning influences these representations. It found that language-supervised models had more but less specific category-selective neurons, with reduced spatial localization and activation strength, indicating a shift toward distributed, semantically aligned coding, replicated in CLIP.

Category-selective regions in the human brain-such as the fusiform face area (FFA), extrastriate body area (EBA), parahippocampal place area (PPA), and visual word form area (VWFA)-support high-level visual recognition. Here, we investigate whether artificial neural networks (ANNs) exhibit analogous category-selective neurons and how these representations are shaped by language experience. Using an fMRI-inspired functional localizer approach, we identified face-, body-, place-, and word-selective neurons in deep networks presented with category images and scrambled controls. Both the purely visual ResNet and a linguistically supervised Lang-Learned ResNet contained category-selective neurons that increased in proportion across layers. However, compared to the vision-only model, the Lang-Learned ResNet showed a greater number but lower specificity of category-selective neurons, along with reduced spatial localization and attenuated activation strength-indicating a shift toward more distributed, semantically aligned coding. These effects were replicated in the large-scale vision-language model CLIP. Together, our findings reveal that language experience systematically reorganizes visual category representations in ANNs, providing a computational parallel to how linguistic context may shape categorical organization in the human brain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes