CV AISep 29, 2025

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, Ding Zhao

arXiv:2509.25390v113.14 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of assessing spatial reasoning capabilities in VLMs, which is crucial for improving their ability to reason about physical space, though it is incremental as it builds on existing benchmarking efforts.

The authors tackled the problem of evaluating spatial reasoning in vision language models (VLMs) by introducing SpinBench, a diagnostic benchmark focused on perspective taking, and found systematic weaknesses such as egocentric bias and poor rotational understanding, with human accuracy at 91.2% and VLM performance correlating with human response times.

We present SpinBench, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs). SpinBench is designed around the core challenge of spatial reasoning: perspective taking, the ability to reason about how scenes and object relations change under viewpoint transformation. Since perspective taking requires multiple cognitive capabilities, such as recognizing objects across views, relative positions grounding, and mentally simulating transformations, SpinBench introduces a set of fine-grained diagnostic categories. Our categories target translation, rotation, object relative pose, and viewpoint change, and are progressively structured so that single-object simpler tasks scaffold toward the most demanding multi-object perspective-taking setting. We evaluate 37 state-of-the-art VLMs, both proprietary and open source. Results reveal systematic weaknesses: strong egocentric bias, poor rotational understanding, and inconsistencies under symmetrical and syntactic reformulations. Scaling analysis shows both smooth improvements and emergent capabilities. While human subjects achieve high accuracy (91.2\%), task difficulty as measured by human response time shows strong correlation with VLM accuracy, indicating that SpinBench captures spatial reasoning challenges shared across humans and VLMs. We believe SpinBench provides critical insights into spatial reasoning in VLMs and highlights key gaps in their ability to reason about physical space. Our website can be found at https://spinbench25.github.io/.

View on arXiv PDF

Similar