CV CLMay 30, 2025

Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation

Yucheng Zhou, Jiahao Yuan, Qianning Wang

arXiv:2505.24787v127.229 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This work addresses a critical bottleneck in text-to-image generation for users needing detailed and accurate visual outputs from complex prompts, though it is incremental as it builds on existing models.

The paper tackles the problem of text-to-image models struggling with complex instructions by introducing LongBench-T2I, a benchmark with 500 prompts across nine dimensions, and Plan2Gen, an agent framework that improves generation without retraining, achieving enhanced performance in multi-object and spatial tasks.

Recent advancements in text-to-image (T2I) generation have enabled models to produce high-quality images from textual descriptions. However, these models often struggle with complex instructions involving multiple objects, attributes, and spatial relationships. Existing benchmarks for evaluating T2I models primarily focus on general text-image alignment and fail to capture the nuanced requirements of complex, multi-faceted prompts. Given this gap, we introduce LongBench-T2I, a comprehensive benchmark specifically designed to evaluate T2I models under complex instructions. LongBench-T2I consists of 500 intricately designed prompts spanning nine diverse visual evaluation dimensions, enabling a thorough assessment of a model's ability to follow complex instructions. Beyond benchmarking, we propose an agent framework (Plan2Gen) that facilitates complex instruction-driven image generation without requiring additional model training. This framework integrates seamlessly with existing T2I models, using large language models to interpret and decompose complex prompts, thereby guiding the generation process more effectively. As existing evaluation metrics, such as CLIPScore, fail to adequately capture the nuances of complex instructions, we introduce an evaluation toolkit that automates the quality assessment of generated images using a set of multi-dimensional metrics. The data and code are released at https://github.com/yczhou001/LongBench-T2I.

View on arXiv PDF Code

Similar