AIJul 10, 2025

FloorplanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations

Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka

arXiv:2507.07644v214.711 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a blind spot in LLMs for inconsistent spatial reasoning, which is important for applications in practical settings like indoor layout manipulation, though it is incremental as it focuses on diagnostic benchmarking.

The authors tackled the problem of evaluating spatial reasoning in large language models (LLMs) by introducing FloorplanQA, a benchmark using structured indoor scene representations, and found that models often fail to respect physical constraints and preserve spatial coherence, though they remain robust to small perturbations.

We introduce FloorplanQA, a diagnostic benchmark for evaluating spatial reasoning in large-language models (LLMs). FloorplanQA is grounded in structured representations of indoor scenes, such as (e.g., kitchens, living rooms, bedrooms, bathrooms, and others), encoded symbolically in JSON or XML layouts. The benchmark covers core spatial tasks, including distance measurement, visibility, path finding, and object placement within constrained spaces. Our results across a variety of frontier open-source and commercial LLMs reveal that while models may succeed in shallow queries, they often fail to respect physical constraints, preserve spatial coherence, though they remain mostly robust to small spatial perturbations. FloorplanQA uncovers a blind spot in today's LLMs: inconsistent reasoning about indoor layouts. We hope this benchmark inspires new work on language models that can accurately infer and manipulate spatial and geometric properties in practical settings.

View on arXiv PDF

Similar