CVSep 2, 2025

Hues and Cues: Human vs. CLIP

arXiv:2509.02305v21 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more nuanced evaluation methods for AI models, particularly in identifying subtle deficiencies like cultural biases, though it is incremental as it applies an existing model to a new task.

The authors tackled the problem of evaluating artificial models' human-like characteristics by proposing a new approach using board games, specifically testing CLIP's color perception and naming via Hues & Cues, and found that CLIP is generally well-aligned with humans but reveals cultural biases and inconsistencies at different abstraction levels.

Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues and assess its alignment with humans. Our experiments show that CLIP is generally well aligned with human observers, but our approach brings to light certain cultural biases and inconsistencies when dealing with different abstraction levels that are hard to identify with other testing strategies. Our findings indicate that assessing models with different tasks like board games can make certain deficiencies in the models stand out in ways that are difficult to test with the commonly used benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes