CVDec 1, 2025

IC-World: In-Context Generation for Shared World Modeling

arXiv:2512.02793v12 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses the challenge of generating consistent multi-view videos for applications like virtual reality or robotics, representing a novel exploration in video-based world modeling.

The paper tackles the problem of shared world modeling, where a model generates multiple videos from input images of the same world in different camera poses, and proposes IC-World, a framework that uses in-context generation and reinforcement learning to achieve state-of-the-art performance in geometry and motion consistency.

Video-based world models have recently garnered increasing attention for their ability to synthesize diverse and dynamic visual environments. In this paper, we focus on shared world modeling, where a model generates multiple videos from a set of input images, each representing the same underlying world in different camera poses. We propose IC-World, a novel generation framework, enabling parallel generation for all input images via activating the inherent in-context generation capability of large video models. We further finetune IC-World via reinforcement learning, Group Relative Policy Optimization, together with two proposed novel reward models to enforce scene-level geometry consistency and object-level motion consistency among the set of generated videos. Extensive experiments demonstrate that IC-World substantially outperforms state-of-the-art methods in both geometry and motion consistency. To the best of our knowledge, this is the first work to systematically explore the shared world modeling problem with video-based world models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes