CVJul 30, 2023

HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation

Jinbo Wu, Xiaobo Gao, Xing Liu, Zhengyang Shen, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding

arXiv:2307.16183v118.730 citationsh-index: 60

Originality Incremental advance

AI Analysis

This work addresses text-to-3D generation for applications like gaming or design, but it appears incremental as it builds on existing methods like Magic3D and scoring distillation losses.

The paper tackles the problem of generating detailed 3D models from text by leveraging 2D diffusion priors, proposing HD-Fusion which combines multiple noise estimation processes to enable higher-resolution rendering, resulting in improved quality and detail compared to baselines.

In this paper, we study Text-to-3D content generation leveraging 2D diffusion priors to enhance the quality and detail of the generated 3D models. Recent progress (Magic3D) in text-to-3D has shown that employing high-resolution (e.g., 512 x 512) renderings can lead to the production of high-quality 3D models using latent diffusion priors. To enable rendering at even higher resolutions, which has the potential to further augment the quality and detail of the models, we propose a novel approach that combines multiple noise estimation processes with a pretrained 2D diffusion prior. Distinct from the Bar-Tal et al.s' study which binds multiple denoised results to generate images from texts, our approach integrates the computation of scoring distillation losses such as SDS loss and VSD loss which are essential techniques for the 3D content generation with 2D diffusion priors. We experimentally evaluated the proposed approach. The results show that the proposed approach can generate high-quality details compared to the baselines.

View on arXiv PDF

Similar