CVCLSep 25, 2025

Hallucination as an Upper Bound: A New Perspective on Text-to-Image Evaluation

arXiv:2509.21257v2h-index: 21
Originality Incremental advance
AI Analysis

This work addresses the need for more comprehensive evaluation in text-to-image generation, particularly for researchers and developers, but it is incremental as it builds on existing concepts from language and vision-language models.

The paper tackles the problem of evaluating text-to-image generative models by defining hallucination as bias-driven deviations beyond prompt alignment, proposing a taxonomy with three categories. This framing introduces an upper bound for evaluation to surface hidden biases, providing a foundation for richer assessment.

In language and vision-language models, hallucination is broadly understood as content generated from a model's prior knowledge or biases rather than from the given input. While this phenomenon has been studied in those domains, it has not been clearly framed for text-to-image (T2I) generative models. Existing evaluations mainly focus on alignment, checking whether prompt-specified elements appear, but overlook what the model generates beyond the prompt. We argue for defining hallucination in T2I as bias-driven deviations and propose a taxonomy with three categories: attribute, relation, and object hallucinations. This framing introduces an upper bound for evaluation and surfaces hidden biases, providing a foundation for richer assessment of T2I models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes