CVMar 13

Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models

arXiv:2603.1321537.03 citationsh-index: 4
Predicted impact top 4% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses a foundational issue in video world models for AI researchers, providing a new evaluation protocol to detect biases, though it is incremental as it focuses on benchmarking rather than proposing a novel method.

The paper tackles the problem of whether video world models can decouple state evolution from observation by designing STEVO-Bench, a benchmark that applies observation control to evaluate these models, exposing their limitations in handling natural evolutions like water pouring or ice melting without direct observation.

Evolutions in the world, such as water pouring or ice melting, happen regardless of being observed. Video world models generate "worlds" via 2D frame observations. Can these generated "worlds" evolve regardless of observation? To probe this question, we design a benchmark to evaluate whether video world models can decouple state evolution from observation. Our benchmark, STEVO-Bench, applies observation control to evolving processes via instructions of occluder insertion, turning off the light, or specifying camera "lookaway" trajectories. By evaluating video models with and without camera control for a diverse set of naturally-occurring evolutions, we expose their limitations in decoupling state evolution from observation. STEVO-Bench proposes an evaluation protocol to automatically detect and disentangle failure modes of video world models across key aspects of natural state evolution. Analysis of STEVO-Bench results provide new insight into potential data and architecture bias of present-day video world models. Project website: https://glab-caltech.github.io/STEVOBench/. Blog: https://ziqi-ma.github.io/blog/2026/outofsight/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes