CVMar 6

What if? Emulative Simulation with World Models for Situated Reasoning

arXiv:2603.06445v11 citations
Predicted impact top 12% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the problem of enabling agents to reason spatially without physical exploration, which is incremental as it builds on existing world models and datasets.

The paper tackles the problem of situated reasoning in scenarios where active exploration is infeasible, by introducing WanderDream, a large-scale dataset for emulative simulation, and shows that world models achieve compelling performance and imagination facilitates reasoning on this dataset.

Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation, can an agent mentally simulate a future trajectory toward a target situation and answer spatial what-if questions? We introduce WanderDream, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration. WanderDream-Gen comprises 15.8K panoramic videos across 1,088 real scenes from HM3D, ScanNet++, and real-world captures, depicting imagined trajectories from current viewpoints to target situations. WanderDream-QA contains 158K question-answer pairs, covering starting states, paths, and end states along each trajectory to comprehensively evaluate exploration-based reasoning. Extensive experiments with world models and MLLMs demonstrate (1) that mental exploration is essential for situated reasoning, (2) that world models achieve compelling performance on WanderDream-Gen, (3) that imagination substantially facilitates reasoning on WanderDream-QA, and (4) that WanderDream data exhibit remarkable transferability to real-world scenarios. The source code and all data will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes