CVMay 2, 2025

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Daoan Zhang, Che Jiang, Ruoshi Xu, Biaoxiang Chen, Zijian Jin, Yutian Lu, Jianguo Zhang, Liang Yong, Jiebo Luo, Shengda Luo

arXiv:2505.01490v121.715 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the need for better reasoning in text-to-image generation for real-world applications, but it is incremental as it focuses on benchmarking rather than a new model.

The authors tackled the problem of text-to-image models struggling with prompts requiring world knowledge and implicit reasoning by introducing WorldGenBench, a benchmark for evaluation, and found that proprietary auto-regressive models like GPT-4o significantly outperform diffusion models in reasoning and knowledge integration.

Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models still struggle with prompts that require rich world knowledge and implicit reasoning: both of which are critical for producing semantically accurate, coherent, and contextually appropriate images in real-world scenarios. To address this gap, we introduce \textbf{WorldGenBench}, a benchmark designed to systematically evaluate T2I models' world knowledge grounding and implicit inferential capabilities, covering both the humanities and nature domains. We propose the \textbf{Knowledge Checklist Score}, a structured metric that measures how well generated images satisfy key semantic expectations. Experiments across 21 state-of-the-art models reveal that while diffusion models lead among open-source methods, proprietary auto-regressive models like GPT-4o exhibit significantly stronger reasoning and knowledge integration. Our findings highlight the need for deeper understanding and inference capabilities in next-generation T2I systems. Project Page: \href{https://dwanzhang-ai.github.io/WorldGenBench/}{https://dwanzhang-ai.github.io/WorldGenBench/}

View on arXiv PDF

Similar