CVAIMar 27

MemCam: Memory-Augmented Camera Control for Consistent Video Generation

arXiv:2603.2619363.42 citationsh-index: 3Has Code
AI Analysis

This addresses a key challenge in video generation for applications like scene simulation and video creation, offering an incremental improvement over existing methods by enhancing consistency in long sequences.

The paper tackles the problem of maintaining scene consistency in interactive video generation under dynamic camera control, particularly for long videos with large rotations, by proposing MemCam, a memory-augmented approach that uses compressed historical frames as context, resulting in significant outperformance over baseline and state-of-the-art methods in scene consistency metrics.

Interactive video generation has significant potential for scene simulation and video creation. However, existing methods often struggle with maintaining scene consistency during long video generation under dynamic camera control due to limited contextual information. To address this challenge, we propose MemCam, a memory-augmented interactive video generation approach that treats previously generated frames as external memory and leverages them as contextual conditioning to achieve controllable camera viewpoints with high scene consistency. To enable longer and more relevant context, we design a context compression module that encodes memory frames into compact representations and employs co-visibility-based selection to dynamically retrieve the most relevant historical frames, thereby reducing computational overhead while enriching contextual information. Experiments on interactive video generation tasks show that MemCam significantly outperforms existing baseline methods as well as open-source state-of-the-art approaches in terms of scene consistency, particularly in long video scenarios with large camera rotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes