CVMar 12

MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens

arXiv:2603.1251379.85 citations
Predicted impact top 29% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of generating long, coherent videos for applications in media and AI, representing an incremental improvement over existing sliding-window and static token methods.

The paper tackled the problem of fidelity degradation, identity drift, and motion stagnation in long-horizon video generation by introducing MemRoPE, a training-free framework that uses evolving memory tokens and online RoPE indexing to maintain temporal coherence and visual fidelity, achieving superior performance in minute- to hour-scale generation.

Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual long-term and short-term streams via exponential moving averages, maintaining both global identity and recent dynamics within a fixed-size cache. Online RoPE Indexing caches unrotated keys and applies positional embeddings dynamically at attention time, ensuring the aggregation is free of conflicting positional phases. These two mechanisms are mutually enabling: positional decoupling makes temporal aggregation well-defined, while aggregation makes fixed-size caching viable for unbounded generation. Extensive experiments validate that MemRoPE outperforms existing methods in temporal coherence, visual fidelity, and subject consistency across minute- to hour-scale generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes