Does Conceptual Representation Require Embodiment? Insights From Large Language Models
This addresses the problem of understanding whether language alone suffices for complex concepts, with implications for AI and cognitive science, though it is incremental in comparing existing models.
The study compared conceptual representations of 4,442 lexical concepts between humans and ChatGPT models, finding that while both models align with humans in non-sensorimotor domains, they lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5 due to its additional visual learning.
To what extent can language alone give rise to complex concepts, or is embodied experience essential? Recent advancements in large language models (LLMs) offer fresh perspectives on this question. Although LLMs are trained on restricted modalities, they exhibit human-like performance in diverse psychological tasks. Our study compared representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4) across multiple dimensions, including five key domains: emotion, salience, mental visualization, sensory, and motor experience. We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability. These results highlight the limitations of language in isolation, and that the integration of diverse modalities of inputs leads to a more human-like conceptual representation.