CVAIGRMar 24, 2024

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

arXiv:2403.16210v222 citationsh-index: 8SIGGRAPH Asia
Originality Highly original
AI Analysis

This addresses the need for efficient 3D scene generation with separated parts for applications like re-texturing and rearrangement, representing a novel method for a known bottleneck.

The paper tackles the problem of generating semantic-compositional 3D scenes by introducing Frankenstein, a diffusion-based framework that produces multiple separated shapes in a single pass, demonstrating promising results for room interiors and human avatars.

We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting. Our project page is available at: https://wolfball.github.io/frankenstein/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes