CVSep 26, 2025

Color Names in Vision-Language Models

arXiv:2509.22524v12 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses the problem of ensuring effective human-AI interaction by assessing whether VLMs name colors like humans, though it is incremental as it builds on classic color naming methodologies.

The study systematically evaluated color naming capabilities across five vision-language models, finding high accuracy on prototypical colors but significant drops on non-prototypical sets, with 21 common color terms identified and cross-linguistic imbalances favoring English and Chinese.

Color serves as a fundamental dimension of human visual perception and a primary means of communicating about objects and scenes. As vision-language models (VLMs) become increasingly prevalent, understanding whether they name colors like humans is crucial for effective human-AI interaction. We present the first systematic evaluation of color naming capabilities across VLMs, replicating classic color naming methodologies using 957 color samples across five representative models. Our results show that while VLMs achieve high accuracy on prototypical colors from classical studies, performance drops significantly on expanded, non-prototypical color sets. We identify 21 common color terms that consistently emerge across all models, revealing two distinct approaches: constrained models using predominantly basic terms versus expansive models employing systematic lightness modifiers. Cross-linguistic analysis across nine languages demonstrates severe training imbalances favoring English and Chinese, with hue serving as the primary driver of color naming decisions. Finally, ablation studies reveal that language model architecture significantly influences color naming independent of visual processing capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes