CVJun 15, 2025

Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors

arXiv:2506.12716v1h-index: 28
Originality Highly original
AI Analysis

This addresses the problem of scene reconstruction and novel view synthesis in complex, cluttered environments for computer vision and graphics applications, representing a novel method for a known bottleneck.

The paper tackles generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions by introducing GenMOJO, which integrates deformable 3D Gaussian optimization with generative priors for view synthesis, resulting in more realistic novel views and more accurate point tracks compared to existing approaches.

We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered scenes. To address this, GenMOJO decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object. This object-wise decomposition allows leveraging object-centric diffusion models to infer unobserved regions in novel viewpoints. It performs joint Gaussian splatting to render the full scene, capturing cross-object occlusions, and enabling occlusion-aware supervision. To bridge the gap between object-centric priors and the global frame-centric coordinate system of videos, GenMOJO uses differentiable transformations that align generative and rendering constraints within a unified framework. The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input. Quantitative evaluations and perceptual human studies confirm that GenMOJO generates more realistic novel views of scenes and produces more accurate point tracks compared to existing approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes