NCAIOct 3, 2025

Brain-Language Model Alignment: Insights into the Platonic Hypothesis and Intermediate-Layer Advantage

arXiv:2510.17833v15 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of understanding brain-model alignment for researchers in neuroscience and AI, but it is incremental as it reviews existing studies without introducing new methods.

This paper reviewed 25 fMRI-based studies to investigate whether brains and language models converge toward similar internal representations, finding evidence supporting the Platonic Representation Hypothesis and the Intermediate-Layer Advantage.

Do brains and language models converge toward the same internal representations of the world? Recent years have seen a rise in studies of neural activations and model alignment. In this work, we review 25 fMRI-based studies published between 2023 and 2025 and explicitly confront their findings with two key hypotheses: (i) the Platonic Representation Hypothesis -- that as models scale and improve, they converge to a representation of the real world, and (ii) the Intermediate-Layer Advantage -- that intermediate (mid-depth) layers often encode richer, more generalizable features. Our findings provide converging evidence that models and brains may share abstract representational structures, supporting both hypotheses and motivating further research on brain-model alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes