CVJul 26, 2023

Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation

arXiv:2307.13908v148 citationsh-index: 71
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating realistic and controllable 3D models from text for applications in computer graphics and AI, representing an incremental improvement over existing methods.

The paper tackles the problem of view inconsistency and arbitrary shape in text-to-3D generation by proposing Points-to-3D, a framework that uses sparse 3D points from Point-E and a point cloud guidance loss to guide NeRF-based generation, improving view consistency and shape controllability as demonstrated in qualitative and quantitative comparisons.

Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes