Interactive Scene Authoring with Specialized Generative Primitives
This addresses the challenge of complex 3D design tools for non-experts, offering a lightweight and controllable solution, though it appears incremental as it builds on existing methods like 3D Gaussian Splatting and Generative Cellular Automata.
The paper tackles the problem of enabling non-expert users to create high-quality 3D scenes by introducing Specialized Generative Primitives, a framework that allows interactive scene authoring with training times under 10 minutes per primitive and generation in a few minutes.
Generating high-quality 3D digital assets often requires expert knowledge of complex design tools. We introduce Specialized Generative Primitives, a generative framework that allows non-expert users to author high-quality 3D scenes in a seamless, lightweight, and controllable manner. Each primitive is an efficient generative model that captures the distribution of a single exemplar from the real world. With our framework, users capture a video of an environment, which we turn into a high-quality and explicit appearance model thanks to 3D Gaussian Splatting. Users then select regions of interest guided by semantically-aware features. To create a generative primitive, we adapt Generative Cellular Automata to single-exemplar training and controllable generation. We decouple the generative task from the appearance model by operating on sparse voxels and we recover a high-quality output with a subsequent sparse patch consistency step. Each primitive can be trained within 10 minutes and used to author new scenes interactively in a fully compositional manner. We showcase interactive sessions where various primitives are extracted from real-world scenes and controlled to create 3D assets and scenes in a few minutes. We also demonstrate additional capabilities of our primitives: handling various 3D representations to control generation, transferring appearances, and editing geometries.