Barycentric alignment for instance-level comparison of neural representations
This work addresses the challenge of comparing neural network representations for researchers in machine learning and neuroscience, offering a novel method for instance-level analysis that reveals previously undetectable phenomena.
The paper tackles the problem of comparing neural representations across models by introducing a barycentric alignment framework that quotients out symmetries to enable instance-level similarity, revealing systematic input properties that predict convergence or divergence and showing that post-hoc alignment yields cross-modal similarity scores that track human judgments and approach contrastively trained model performance.
Comparing representations across neural networks is challenging because representations admit symmetries, such as arbitrary reordering of units or rotations of activation space, that obscure underlying equivalence between models. We introduce a barycentric alignment framework that quotients out these nuisance symmetries to construct a universal embedding space across many models. Unlike existing similarity measures, which summarize relationships over entire stimulus sets, this framework enables similarity to be defined at the level of individual stimuli, revealing inputs that elicit convergent versus divergent representations across models. Using this instance-level notion of similarity, we identify systematic input properties that predict representational convergence versus divergence across vision and language model families. We also construct universal embedding spaces for brain representations across individuals and cortical regions, enabling instance-level comparison of representational agreement across stages of the human visual hierarchy. Finally, we apply the same barycentric alignment framework to purely unimodal vision and language models and find that post-hoc alignment into a shared space yields image text similarity scores that closely track human cross-modal judgments and approach the performance of contrastively trained vision-language models. This strikingly suggests that independently learned representations already share sufficient geometric structure for human-aligned cross-modal comparison. Together, these results show that resolving representational similarity at the level of individual stimuli reveals phenomena that cannot be detected by set-level comparison metrics.