CVApr 16, 2025

WORLDMEM: Long-term Consistent World Simulation with Memory

arXiv:2504.12369v1101 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of maintaining consistency in virtual environments for applications like simulation and interaction, representing an incremental improvement over existing methods.

The paper tackles the problem of limited temporal context in world simulation, which causes failures in long-term consistency, particularly in 3D spatial consistency, by introducing WorldMem, a framework that uses a memory bank and attention mechanism to accurately reconstruct scenes under viewpoint or temporal gaps and model dynamic evolution, with experiments validating its effectiveness.

World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WorldMem, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing a memory attention mechanism that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps. Furthermore, by incorporating timestamps into the states, our framework not only models a static world but also captures its dynamic evolution over time, enabling both perception and interaction within the simulated world. Extensive experiments in both virtual and real scenarios validate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes