CVJun 3, 2025

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

arXiv:2506.03141v2108 citationsh-index: 20SIGGRAPH Asia
Originality Incremental advance
AI Analysis

This addresses the challenge of maintaining scene consistency in long videos for interactive generation applications, representing an incremental improvement over existing approaches.

The paper tackles the problem of scene-consistent memory in long interactive video generation by proposing Context-as-Memory, which uses historical context as memory with a retrieval module to reduce computational overhead, achieving superior memory capabilities compared to state-of-the-art methods and generalizing to open-domain scenarios.

Recent advances in interactive video generation have shown promising results, yet existing approaches struggle with scene-consistent memory capabilities in long video generation due to limited use of historical context. In this work, we propose Context-as-Memory, which utilizes historical context as memory for video generation. It includes two simple yet effective designs: (1) storing context in frame format without additional post-processing; (2) conditioning by concatenating context and frames to be predicted along the frame dimension at the input, requiring no external control modules. Furthermore, considering the enormous computational overhead of incorporating all historical context, we propose the Memory Retrieval module to select truly relevant context frames by determining FOV (Field of View) overlap between camera poses, which significantly reduces the number of candidate frames without substantial information loss. Experiments demonstrate that Context-as-Memory achieves superior memory capabilities in interactive long video generation compared to SOTAs, even generalizing effectively to open-domain scenarios not seen during training. The link of our project page is https://context-as-memory.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes