Human-like object concept representations emerge naturally in multimodal large language models
This work addresses the problem of understanding machine intelligence and developing more human-like AI systems, though it is incremental in building on existing LLM research.
The study investigated whether large language models (LLMs) and multimodal LLMs (MLLMs) develop human-like object concept representations, finding that their 66-dimensional embeddings from 4.7 million triplet judgments on 1,854 objects showed semantic clustering and neural alignment similar to human cognition.
Understanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition. With the advent of Large Language Models (LLMs), a key question arises: can these models develop human-like object representations from linguistic and multimodal data? In this study, we combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition. We collected 4.7 million triplet judgments from LLMs and Multimodal LLMs (MLLMs) to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were stable, predictive, and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and MLLMs develop human-like conceptual representations of objects. Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as EBA, PPA, RSC, and FFA. This provides compelling evidence that the object representations in LLMs, while not identical to human ones, share fundamental similarities that reflect key aspects of human conceptual knowledge. Our findings advance the understanding of machine intelligence and inform the development of more human-like artificial cognitive systems.