30.7CVMay 29Code
Thinking in Structures: Evaluating Spatial Intelligence in Constraint-Governed SpacesChen Yang, Guanxin Lin, Youquan He et al.
Spatial intelligence is crucial for vision--language models (VLMs), yet many scene-centric benchmarks evaluate unconstrained environments where a single image may admit multiple plausible 3D interpretations. We introduce SSI-Bench, a VQA benchmark for Structure-Centric Spatial Reasoning (SCSR) in constraint-governed spaces. Built from complex real-world 3D structures, it uses structural constraints from geometry, topology, and physical feasibility to make component relations more determinate from visual evidence. The benchmark contains 1,000 ranking questions spanning geometric and topological reasoning, where correct ordering requires resolving all candidate-wise 3D relations, imposing stronger demands on spatial understanding. It is created through a fully human-centered pipeline with over 400 researcher-hours of image curation, component annotation, and question design. Evaluating 31 VLMs reveals a large gap to humans: the best open-source model achieves 22.2% accuracy and the strongest closed-source model reaches 33.6%, while humans score 91.6%. Further results show that chain-of-thought reasoning brings only marginal gains, and error analysis reveals fundamental limitations in current models' spatial understanding within constraint-governed spaces. Project page: https://ssi-bench.github.io.
LGSep 28, 2025
DRIK: Distribution-Robust Inductive Kriging without Information LeakageChen Yang, Changhao Zhao, Chen Wang et al.
Inductive kriging supports high-resolution spatio-temporal estimation with sparse sensor networks, but conventional training-evaluation setups often suffer from information leakage and poor out-of-distribution (OOD) generalization. We find that the common 2x2 spatio-temporal split allows test data to influence model selection through early stopping, obscuring the true OOD characteristics of inductive kriging. To address this issue, we propose a 3x3 partition that cleanly separates training, validation, and test sets, eliminating leakage and better reflecting real-world applications. Building on this redefined setting, we introduce DRIK, a Distribution-Robust Inductive Kriging approach designed with the intrinsic properties of inductive kriging in mind to explicitly enhance OOD generalization, employing a three-tier strategy at the node, edge, and subgraph levels. DRIK perturbs node coordinates to capture continuous spatial relationships, drops edges to reduce ambiguity in information flow and increase topological diversity, and adds pseudo-labeled subgraphs to strengthen domain generalization. Experiments on six diverse spatio-temporal datasets show that DRIK consistently outperforms existing methods, achieving up to 12.48% lower MAE while maintaining strong scalability.