Large Point-to-Gaussian Model for Image-to-3D Generation
This work addresses the problem of efficient and high-quality 3D asset creation from images for applications in graphics and AI, representing an incremental improvement over existing methods.
The paper tackles the challenge of generating 3D assets from 2D images by proposing a large Point-to-Gaussian model that uses an initial point cloud from a 3D diffusion model to generate Gaussian parameters, significantly improving image-to-3D generation with state-of-the-art results on GSO and Objaverse datasets.
Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \textbf{A}ttention mechanism, \textbf{P}rojection mechanism, and \textbf{P}oint feature extractor, dubbed as \textbf{APP} block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.