CVNov 15, 2023

DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang

arXiv:2311.09217v138.3237 citationsh-index: 36

Originality Highly original

AI Analysis

This addresses the problem of efficient and high-quality 3D generation for applications in computer vision and graphics, representing a novel method for a known bottleneck rather than a foundational advancement.

The paper tackled 3D generation by proposing DMV3D, a method that uses a transformer-based 3D reconstruction model to denoise multi-view diffusion, achieving single-stage 3D generation in about 30 seconds on a single A100 GPU and demonstrating state-of-the-art results in single-image reconstruction and text-to-3D generation.

We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on large-scale multi-view image datasets of highly diverse objects using only image reconstruction losses, without accessing 3D assets. We demonstrate state-of-the-art results for the single-image reconstruction problem where probabilistic modeling of unseen object parts is required for generating diverse reconstructions with sharp textures. We also show high-quality text-to-3D generation results outperforming previous 3D diffusion models. Our project website is at: https://justimyhxu.github.io/projects/dmv3d/ .

View on arXiv PDF

Similar