CVDec 2, 2024

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

arXiv:2412.01801v225 citationsh-index: 45CVPR
Originality Incremental advance
AI Analysis

This addresses the need for precise, intuitive editing in 3D scene generation for applications like virtual reality or game design, though it appears incremental by building on existing diffusion models.

The paper tackles the problem of controllable 3D scene generation by introducing SceneFactor, a diffusion-based method that uses factored latent manifolds for text-guided synthesis and enables localized editing via semantic 3D proxy boxes, achieving high-fidelity results as demonstrated in experiments.

We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes