CV GRNov 21, 2024

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang

arXiv:2411.14384v519.423 citationsh-index: 26Has Code

Originality Highly original

AI Analysis

This addresses the challenge of fast and scalable 3D generation from images for applications in graphics and vision, offering a novel approach that is not incremental but introduces a new method for a known bottleneck.

The paper tackles the problem of generating and reconstructing 3D content from a single image by proposing DiffusionGS, a single-stage 3D diffusion model that directly outputs 3D Gaussian point clouds to ensure view consistency and handle diverse inputs. It achieves improvements of 2.20 dB/23.25 and 1.34 dB/19.16 in PSNR/FID for objects and scenes compared to state-of-the-art methods, with over 5x faster speed (~6s on an A100 GPU).

Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric cases. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object generation and scene reconstruction from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generality of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that DiffusionGS yields improvements of 2.20 dB/23.25 and 1.34 dB/19.16 in PSNR/FID for objects and scenes than the state-of-the-art methods, without depth estimator. Plus, our method enjoys over 5$\times$ faster speed ($\sim$6s on an A100 GPU). Our Project page at https://caiyuanhao1998.github.io/project/DiffusionGS/ shows the video and interactive results. The code and models are publicly available at https://github.com/caiyuanhao1998/Open-DiffusionGS

View on arXiv PDF Code

Similar