CVDec 2, 2024

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation

Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai

arXiv:2412.01801v217.825 citationsh-index: 45CVPR

Originality Incremental advance

AI Analysis

This addresses the need for precise, intuitive editing in 3D scene generation for applications like virtual reality or game design, though it appears incremental by building on existing diffusion models.

The paper tackles the problem of controllable 3D scene generation by introducing SceneFactor, a diffusion-based method that uses factored latent manifolds for text-guided synthesis and enables localized editing via semantic 3D proxy boxes, achieving high-fidelity results as demonstrated in experiments.

We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.

View on arXiv PDF

Similar