CVGRSep 5, 2024

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

arXiv:2409.03718v117 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses the challenge of computational cost and data scarcity in text-to-3D generation for applications requiring efficient 3D asset creation.

The paper tackles the problem of generating high-quality 3D objects from text by introducing Geometry Image Diffusion (GIMDiffusion), which uses geometry images to represent 3D shapes as 2D images, enabling fast generation speeds comparable to text-to-image models and strong generalization with limited 3D data.

Generating high-quality 3D objects from textual descriptions remains a challenging problem due to computational cost, the scarcity of 3D data, and complex 3D representations. We introduce Geometry Image Diffusion (GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images, thereby avoiding the need for complex 3D-aware architectures. By integrating a Collaborative Control mechanism, we exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion. This enables strong generalization even with limited 3D training data (allowing us to use only high-quality training data) as well as retaining compatibility with guidance techniques such as IPAdapter. In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models. The generated objects consist of semantically meaningful, separate parts and include internal structures, enhancing both usability and versatility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes