Universal dimensions of visual representation
This addresses the problem of understanding the alignment between artificial and biological vision for researchers in neuroscience and AI, suggesting a core set of universal image representations, though it is incremental in building on prior work.
The study investigated whether neural network models of vision learn brain-aligned representations due to shared constraints or universal features, finding that diverse networks converge on a shared set of latent dimensions, with fewer than ten universal dimensions largely preserving similarity to human brain representations.
Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network's specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.