CVOct 23, 2025

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models

arXiv:2510.20586v11 citationsh-index: 23
Originality Synthesis-oriented
AI Analysis

This addresses the need for precise color generation in applications like art and design, but it is incremental as it introduces a new benchmark rather than a novel method.

The authors tackled the problem of fine-grained color controllability in text-to-image generation models by proposing GenColorBench, a comprehensive benchmark with 44K color-focused prompts, which revealed performance variations and failure modes in popular models.

Recent years have seen impressive advances in text-to-image generation, with image generative or unified models producing high-quality images from text. Yet these models still struggle with fine-grained color controllability, often failing to accurately match colors specified in text prompts. While existing benchmarks evaluate compositional reasoning and prompt adherence, none systematically assess color precision. Color is fundamental to human visual perception and communication, critical for applications from art to design workflows requiring brand consistency. However, current benchmarks either neglect color or rely on coarse assessments, missing key capabilities such as interpreting RGB values or aligning with human expectations. To this end, we propose GenColorBench, the first comprehensive benchmark for text-to-image color generation, grounded in color systems like ISCC-NBS and CSS3/X11, including numerical colors which are absent elsewhere. With 44K color-focused prompts covering 400+ colors, it reveals models' true capabilities via perceptual and automated assessments. Evaluations of popular text-to-image models using GenColorBench show performance variations, highlighting which color conventions models understand best and identifying failure modes. Our GenColorBench assessments will guide improvements in precise color generation. The benchmark will be made public upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes