CVMar 21, 2023

Compositional 3D Scene Generation using Locally Conditioned Diffusion

arXiv:2303.12218v2121 citationsh-index: 76
Originality Incremental advance
AI Analysis

This addresses the challenge of intuitive 3D scene design for users in fields like computer graphics and AI, offering a more automated and controlled approach, though it appears incremental as it builds on existing text-to-3D models.

The paper tackles the problem of generating complex 3D scenes, which is typically manual and requires expertise, by introducing a method for compositional 3D scene generation that provides control over semantic parts using text prompts and bounding boxes, resulting in higher fidelity than baselines.

Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce \textbf{locally conditioned diffusion} as an approach to compositional scene diffusion, providing control over semantic parts using text prompts and bounding boxes while ensuring seamless transitions between these parts. We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes