CVFeb 4

PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation

arXiv:2602.04876v16 citationsh-index: 15
AI Analysis

This addresses the challenge of generating realistic and physically plausible 4D scenes for applications in robotics, simulation, and virtual reality, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of generating long-horizon, action-conditioned 4D scenes from a single image by introducing PerpetualWonder, a hybrid generative simulator that creates a closed-loop system with a unified representation linking physical state and visual primitives, enabling it to simulate complex, multi-step interactions while maintaining physical plausibility and visual consistency.

We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because their physical state is decoupled from their visual representation, which prevents generative refinements to update the underlying physics for subsequent interactions. PerpetualWonder solves this by introducing the first true closed-loop system. It features a novel unified representation that creates a bidirectional link between the physical state and visual primitives, allowing generative refinements to correct both the dynamics and appearance. It also introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments demonstrate that from a single image, PerpetualWonder can successfully simulate complex, multi-step interactions from long-horizon actions, maintaining physical plausibility and visual consistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes