CVJul 12, 2023

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

arXiv:2307.06350v3238 citationsh-index: 57
Originality Synthesis-oriented
AI Analysis

This provides a comprehensive benchmark for researchers and developers to evaluate and improve compositional text-to-image generation, though it is incremental as it builds on existing benchmarks.

The paper tackles the problem of text-to-image models struggling with complex compositional scenes by introducing T2I-CompBench++, an enhanced benchmark with 8,000 prompts across categories like attribute binding and 3D-spatial relationships, and proposes new evaluation metrics including detection-based methods and MLLMs, benchmarking 11 state-of-the-art models such as FLUX.1 and SD3.

Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-$α$, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes