CVDec 16, 2025

SS4D: Native 4D Generative Model via Structured Spacetime Latents

arXiv:2512.14284v15 citationsh-index: 26ACM Trans Graph
Originality Highly original
AI Analysis

This addresses the challenge of 4D synthesis for applications like animation and simulation, representing a novel method rather than an incremental improvement.

The paper tackles the problem of generating dynamic 3D objects from monocular video by introducing SS4D, a native 4D generative model that achieves high fidelity, temporal coherence, and structural consistency, though no concrete numbers are provided in the abstract.

We present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video. Unlike prior approaches that construct 4D representations by optimizing over 3D or video generative models, we train a generator directly on 4D data, achieving high fidelity, temporal coherence, and structural consistency. At the core of our method is a compressed set of structured spacetime latents. Specifically, (1) To address the scarcity of 4D training data, we build on a pre-trained single-image-to-3D model, preserving strong spatial consistency. (2) Temporal consistency is enforced by introducing dedicated temporal layers that reason across frames. (3) To support efficient training and inference over long video sequences, we compress the latent sequence along the temporal axis using factorized 4D convolutions and temporal downsampling blocks. In addition, we employ a carefully designed training strategy to enhance robustness against occlusion

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes