CVAINov 19, 2024

SSEditor: Controllable Mask-to-Scene Generation with Diffusion Model

arXiv:2411.12290v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the need for more controllable and efficient 3D scene generation for applications like urban scene construction, though it appears incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of limited controllability and flexibility in 3D semantic scene generation by proposing SSEditor, a two-stage diffusion-based framework that generates specified target categories without multiple-step resampling, achieving improved performance in controllability, flexibility, and quality on datasets like SemanticKITTI and CarlaSC.

Recent advancements in 3D diffusion-based semantic scene generation have gained attention. However, existing methods rely on unconditional generation and require multiple resampling steps when editing scenes, which significantly limits their controllability and flexibility. To this end, we propose SSEditor, a controllable Semantic Scene Editor that can generate specified target categories without multiple-step resampling. SSEditor employs a two-stage diffusion-based framework: (1) a 3D scene autoencoder is trained to obtain latent triplane features, and (2) a mask-conditional diffusion model is trained for customizable 3D semantic scene generation. In the second stage, we introduce a geometric-semantic fusion module that enhance the model's ability to learn geometric and semantic information. This ensures that objects are generated with correct positions, sizes, and categories. Extensive experiments on SemanticKITTI and CarlaSC demonstrate that SSEditor outperforms previous approaches in terms of controllability and flexibility in target generation, as well as the quality of semantic scene generation and reconstruction. More importantly, experiments on the unseen Occ-3D Waymo dataset show that SSEditor is capable of generating novel urban scenes, enabling the rapid construction of 3D scenes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes