CVMay 30, 2025

GenSpace: Benchmarking Spatially-Aware Image Generation

arXiv:2505.24870v25 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the need for better evaluation of spatial intelligence in image generation models, which is crucial for applications like photography and scene composition, though it is incremental as it focuses on benchmarking rather than a new generation method.

The paper tackles the problem of evaluating 3D spatial awareness in AI image generators by introducing GenSpace, a benchmark and evaluation pipeline that uses 3D scene reconstruction to measure spatial faithfulness, revealing that current models struggle with object placement and relationships despite creating visually appealing images.

Humans can intuitively compose and arrange scenes in the 3D space for photography. However, can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts? We present GenSpace, a novel benchmark and evaluation pipeline to comprehensively assess the spatial awareness of current image generation models. Furthermore, standard evaluations using general Vision-Language Models (VLMs) frequently fail to capture the detailed spatial errors. To handle this challenge, we propose a specialized evaluation pipeline and metric, which reconstructs 3D scene geometry using multiple visual foundation models and provides a more accurate and human-aligned metric of spatial faithfulness. Our findings show that while AI models create visually appealing images and can follow general instructions, they struggle with specific 3D details like object placement, relationships, and measurements. We summarize three core limitations in the spatial perception of current state-of-the-art image generation models: 1) Object Perspective Understanding, 2) Egocentric-Allocentric Transformation and 3) Metric Measurement Adherence, highlighting possible directions for improving spatial intelligence in image generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes