CVOct 29, 2024

GRADE: Quantifying Sample Diversity in Text-to-Image Models

arXiv:2410.22592v26 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the issue of low diversity in text-to-image models for researchers and developers, highlighting homogeneity and default behaviors as incremental insights into model limitations.

The authors tackled the problem of quantifying sample diversity in text-to-image models by introducing GRADE, an automatic method that uses large language models and visual question-answering to measure diversity via entropy, revealing limited variation in 12 models over 720K images, with stronger models showing deterioration and default behaviors like 98% of cookies being round.

We introduce GRADE, an automatic method for quantifying sample diversity in text-to-image models. Our method leverages the world knowledge embedded in large language models and visual question-answering systems to identify relevant concept-specific axes of diversity (e.g., ``shape'' for the concept ``cookie''). It then estimates frequency distributions of concepts and their attributes and quantifies diversity using entropy. We use GRADE to measure the diversity of 12 models over a total of 720K images, revealing that all models display limited variation, with clear deterioration in stronger models. Further, we find that models often exhibit default behaviors, a phenomenon where a model consistently generates concepts with the same attributes (e.g., 98% of the cookies are round). Lastly, we show that a key reason for low diversity is underspecified captions in training data. Our work proposes an automatic, semantically-driven approach to measure sample diversity and highlights the stunning homogeneity in text-to-image models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes