Probing Perceptual Constancy in Large Vision-Language Models
This work addresses the problem of assessing visual understanding capabilities in AI models for researchers and developers, but it is incremental as it focuses on evaluation rather than proposing new methods.
The study evaluated perceptual constancy in 155 vision-language models across 236 experiments in color, size, and shape domains, finding significant variability and dissociation in performance between shape and other constancies.
Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for visual understanding in a dynamic world. Here, we explored such ability in current Vision Language Models (VLMs). In this study, we evaluated 155 VLMs using 236 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions. We found significant variability in VLM performance across these domains, with model performance in shape constancy clearly dissociated from that of color and size constancy.