GRCVApr 2, 2025

Generating 360° Video is What You Need For a 3D Scene

arXiv:2504.02045v42 citationsh-index: 6SIGGRAPH Asia
Originality Highly original
AI Analysis

This addresses the problem of limited scene data and partial generation for applications in virtual reality and 3D modeling, representing a novel approach rather than an incremental improvement.

The paper tackles the challenge of generating full 3D scenes with navigational freedom by introducing WorldPrompter, a pipeline that synthesizes traversable 3D scenes from text prompts using 360° video as an intermediate representation, achieving a 94.6% COLMAP matching rate and outperforming state-of-the-art methods.

Generating 3D scenes is still a challenging task due to the lack of readily available scene data. Most existing methods only produce partial scenes and provide limited navigational freedom. We introduce a practical and scalable solution that uses 360° video as an intermediate scene representation, capturing the full-scene context and ensuring consistent visual content throughout the generation. We propose WorldPrompter, a generative pipeline that synthesizes traversable 3D scenes from text prompts. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model, trained with a mix of image and video data, achieves convincing spatial and temporal consistency for static scenes. This is validated by an average COLMAP matching rate of 94.6\%, allowing for high-quality panoramic Gaussian splat reconstruction and improved navigation throughout the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes