ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
This addresses a gap in evaluating probabilistic color-concept understanding for text-to-image models, but it is incremental as it focuses on benchmarking rather than solving the underlying issue.
The authors tackled the problem of text-to-image models' limited ability to associate colors with implicit concepts by introducing ColorConceptBench, a human-annotated benchmark with 1,281 concepts and 6,369 annotations, and found that current models lack sensitivity to abstract semantics, with this limitation resistant to scaling and guidance.
While text-to-image (T2I) models have advanced considerably, their capability to associate colors with implicit concepts remains underexplored. To address the gap, we introduce ColorConceptBench, a new human-annotated benchmark to systematically evaluate color-concept associations through the lens of probabilistic color distributions. ColorConceptBench moves beyond explicit color names or codes by probing how models translate 1,281 implicit color concepts using a foundation of 6,369 human annotations. Our evaluation of seven leading T2I models reveals that current models lack sensitivity to abstract semantics, and crucially, this limitation appears resistant to standard interventions (e.g., scaling and guidance). This demonstrates that achieving human-like color semantics requires more than larger models, but demands a fundamental shift in how models learn and represent implicit meaning.