AISep 20, 2024
Failures in Perspective-taking of Multimodal AI SystemsBridget Leonard, Kristin Woodard, Scott O. Murray
This study extends previous research on spatial representations in multimodal AI systems. Although current models demonstrate a rich understanding of spatial information from images, this information is rooted in propositional representations, which differ from the analog representations employed in human and animal spatial cognition. To further explore these limitations, we apply techniques from cognitive and developmental science to assess the perspective-taking abilities of GPT-4o. Our analysis enables a comparison between the cognitive development of the human brain and that of multimodal AI, offering guidance for future research and model development.
CLJun 22, 2020
Mental representations of objects reflect the ways in which we interact with themKa Chun Lam, Francisco Pereira, Maryam Vaziri-Pashkam et al.
In order to interact with objects in our environment, humans rely on an understanding of the actions that can be performed on them, as well as their properties. When considering concrete motor actions, this knowledge has been called the object affordance. Can this notion be generalized to any type of interaction that one can have with an object? In this paper we introduce a method to represent objects in a space where each dimension corresponds to a broad mode of interaction, based on verb selectional preferences in text corpora. This object embedding makes it possible to predict human judgments of verb applicability to objects better than a variety of alternative approaches. Furthermore, we show that the dimensions in this space can be used to predict categorical and functional dimensions in a state-of-the-art mental representation of objects, derived solely from human judgements of object similarity. These results suggest that interaction knowledge accounts for a large part of mental representations of objects.