CVMar 12

InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model

InSpatio Team, Xiaoyu Zhang, Weihong Pan, Zhichao Ye, Jialin Liu, Yipeng Chen, Nan Wang, Xiaojun Xiang, Weijian Xie, Yifu Wang, Haoyu Ji, Siji Pan

arXiv:2603.11911v116.85 citationsh-index: 7Has Code

Predicted impact top 5% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This provides an efficient alternative to traditional video-based world models for real-time world simulation, though it appears incremental as it builds on existing image diffusion models.

The paper tackles the problem of high latency in video-based world models by proposing InSpatio-WorldFM, a real-time frame model that generates frames independently, achieving strong multi-view consistency and enabling interactive exploration on consumer-grade GPUs.

We present InSpatio-WorldFM, an open-source real-time frame model for spatial intelligence. Unlike video-based world models that rely on sequential frame generation and incur substantial latency due to window-level processing, InSpatio-WorldFM adopts a frame-based paradigm that generates each frame independently, enabling low-latency real-time spatial inference. By enforcing multi-view spatial consistency through explicit 3D anchors and implicit spatial memory, the model preserves global scene geometry while maintaining fine-grained visual details across viewpoint changes. We further introduce a progressive three-stage training pipeline that transforms a pretrained image diffusion model into a controllable frame model and finally into a real-time generator through few-step distillation. Experimental results show that InSpatio-WorldFM achieves strong multi-view consistency while supporting interactive exploration on consumer-grade GPUs, providing an efficient alternative to traditional video-based world models for real-time world simulation.

View on arXiv PDF Code

Similar