CVAIFeb 14, 2025

Probing Perceptual Constancy in Large Vision-Language Models

arXiv:2502.10273v26 citationsh-index: 5CogSci
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing visual understanding capabilities in AI models for researchers and developers, but it is incremental as it focuses on evaluation rather than proposing new methods.

The study evaluated perceptual constancy in 155 vision-language models across 236 experiments in color, size, and shape domains, finding significant variability and dissociation in performance between shape and other constancies.

Perceptual constancy is the ability to maintain stable perceptions of objects despite changes in sensory input, such as variations in distance, angle, or lighting. This ability is crucial for visual understanding in a dynamic world. Here, we explored such ability in current Vision Language Models (VLMs). In this study, we evaluated 155 VLMs using 236 experiments across three domains: color, size, and shape constancy. The experiments included single-image and video adaptations of classic cognitive tasks, along with novel tasks in in-the-wild conditions. We found significant variability in VLM performance across these domains, with model performance in shape constancy clearly dissociated from that of color and size constancy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes