LGAICVMay 28

Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments

CMU
arXiv:2601.0107568.04 citationsh-index: 5
AI Analysis

This work addresses the challenge of partial observability in embodied systems by providing a memory structure that respects the smooth, time-parameterized symmetries of sensory input and world dynamics, improving long-horizon prediction for world models.

This paper introduces Flow Equivariant World Modeling, a framework that utilizes time-parameterized symmetries within a latent memory to achieve stable and accurate dynamics prediction over long horizons in partially observed dynamic environments. The latent memory shifts and transforms equivariantly with self-motion and inferred external object motion, maintaining alignment of information about out-of-view regions as time progresses.

Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external objects. These sensory streams and the underlying dynamics of the world obey smooth, time-parameterized symmetries which existing world models ignore. Without a memory that respects this structure, partial observability presents a major obstacle to existing methods: each observation reveals only a fraction of the world, while unobserved regions continue to evolve. In this work, we introduce Flow Equivariant World Modeling, a framework that leverages time-parameterized symmetries within a latent memory for stable and accurate dynamics prediction over long horizons. The latent memory shifts and transforms equivariantly with self-motion and inferred external object motion, keeping information about out-of-view regions aligned as time progresses. We demonstrate the advantage of this framework over state-of-the-art diffusion, memory-augmented, and recurrent world model architectures on 2D and 3D partially observed video world modeling benchmarks. More broadly, our results suggest that predictive representations become more powerful when they are organized in line with the temporal and dynamical structure of the world they model. Project page: https://flowequivariantworldmodels.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes