CV AIFeb 14, 2025

Probing Perceptual Constancy in Large Vision-Language Models

Haoran Sun, Bingyang Wang, Suyang Yu, Yijiang Li, Qingying Gao, Haiyun Lyu, Hokin Deng, Dezhi Luo

arXiv:2502.10273v210.26 citationsh-index: 5CogSci

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of assessing visual understanding capabilities in AI models for researchers and developers, but it is incremental as it focuses on evaluation rather than proposing new methods.

The study evaluated perceptual constancy in 155 vision-language models across 236 experiments in color, size, and shape domains, finding significant variability and dissociation in performance between shape and other constancies.

Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for visual understanding in a dynamic world. Here, we explored such ability in current Vision Language Models (VLMs). In this study, we evaluated 155 VLMs using 236 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions. We found significant variability in VLM performance across these domains, with model performance in shape constancy clearly dissociated from that of color and size constancy.

View on arXiv PDF

Similar