CVMar 15, 2024

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

arXiv:2403.09981v322.133 citationsh-index: 6Has Code3DV

Originality Incremental advance

AI Analysis

This work addresses the under-explored problem of controllable 3D generation from text for applications in content creation and design, representing an incremental advance by combining existing techniques with novel architectural and representation improvements.

The paper tackles controllable text-to-3D generation by introducing Multi-view ControlNet (MVControl) to enhance multi-view diffusion models with input conditions like edge maps, and proposes an efficient pipeline using 3D Gaussians and SuGaR for improved geometry, achieving robust generalization and high-quality 3D content generation.

While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. Project page: https://lizhiqi49.github.io/MVControl/.

View on arXiv PDF Code

Similar