CVLGNov 13, 2025

Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation

arXiv:2511.10547v13 citationsh-index: 4
Originality Incremental advance
AI Analysis

It addresses the problem of homogeneous outputs in text-to-image generation for researchers and developers, offering an incremental improvement in evaluation methods.

This work tackles the lack of diversity in text-to-image models by introducing a framework for systematic diversity evaluation, including a human evaluation template and curated prompts, enabling model ranking and identification of weak categories.

Despite advances in generation quality, current text-to-image (T2I) models often lack diversity, generating homogeneous outputs. This work introduces a framework to address the need for robust diversity evaluation in T2I models. Our framework systematically assesses diversity by evaluating individual concepts and their relevant factors of variation. Key contributions include: (1) a novel human evaluation template for nuanced diversity assessment; (2) a curated prompt set covering diverse concepts with their identified factors of variation (e.g. prompt: An image of an apple, factor of variation: color); and (3) a methodology for comparing models in terms of human annotations via binomial tests. Furthermore, we rigorously compare various image embeddings for diversity measurement. Notably, our principled approach enables ranking of T2I models by diversity, identifying categories where they particularly struggle. This research offers a robust methodology and insights, paving the way for improvements in T2I model diversity and metric development.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes