CVAIApr 3, 2025

Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model

arXiv:2504.02764v110 citationsh-index: 13CVPR
Originality Incremental advance
AI Analysis

This addresses the challenge of 3D scene generation from single images for computer vision applications, representing an incremental improvement over existing video diffusion methods.

The paper tackles the problem of generating consistent 3D scenes from single images using video diffusion models, which often suffer from limited video length and scene inconsistency. The proposed Scene Splatter method achieves high-fidelity and consistent novel view synthesis through a cascaded momentum approach.

In this paper, we propose Scene Splatter, a momentum-based paradigm for video diffusion to generate generic scenes from single image. Existing methods, which employ video generation models to synthesize novel views, suffer from limited video length and scene inconsistency, leading to artifacts and distortions during further reconstruction. To address this issue, we construct noisy samples from original features as momentum to enhance video details and maintain scene consistency. However, for latent features with the perception field that spans both known and unknown regions, such latent-level momentum restricts the generative ability of video diffusion in unknown regions. Therefore, we further introduce the aforementioned consistent video as a pixel-level momentum to a directly generated video without momentum for better recovery of unseen regions. Our cascaded momentum enables video diffusion models to generate both high-fidelity and consistent novel views. We further finetune the global Gaussian representations with enhanced frames and render new frames for momentum update in the next step. In this manner, we can iteratively recover a 3D scene, avoiding the limitation of video length. Extensive experiments demonstrate the generalization capability and superior performance of our method in high-fidelity and consistent scene generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes