CV AIDec 17, 2025

Spatia: Video Generation with Updatable Spatial Memory

Jinjing Zhao, Fangyun Wei, Zhening Liu, Hongyang Zhang, Chang Xu, Yan Lu

arXiv:2512.15716v114 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses a key limitation in video generation models for applications requiring consistent scene geometry, though it appears incremental as an extension of memory-based approaches.

The paper tackles the problem of maintaining long-term spatial and temporal consistency in video generation by proposing Spatia, a framework that uses a persistent 3D scene point cloud as spatial memory, updated via visual SLAM, which enhances spatial consistency and enables applications like camera control and 3D-aware editing.

Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.

View on arXiv PDF

Similar