Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
This addresses the challenge of computational cost and data scarcity in text-to-3D generation for applications requiring efficient 3D asset creation.
The paper tackles the problem of generating high-quality 3D objects from text by introducing Geometry Image Diffusion (GIMDiffusion), which uses geometry images to represent 3D shapes as 2D images, enabling fast generation speeds comparable to text-to-image models and strong generalization with limited 3D data.
Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically meaningful, separate parts and include internal structures, enhancing both usability and versatility.