CVJun 14, 2023

ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

arXiv:2306.08183v23 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the challenge of generating 3D shapes from text for applications in design and visualization, offering a more efficient zero-shot approach.

The paper tackles the problem of text-to-shape generation without requiring 3D supervision or expensive inference-time optimization, achieving open-vocabulary shape generation through architectural adaptations and loss functions.

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes