CVMar 23, 2023

TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

arXiv:2303.13273v138 citationsh-index: 51
Originality Incremental advance
AI Analysis

This addresses the challenge of generating 3D shapes from text for applications in graphics and AI, offering a more efficient alternative to methods requiring ground truth labels or extensive optimization, though it appears incremental in its approach.

The paper tackles the problem of generating controllable 3D textured shapes from textual descriptions by introducing TAPS3D, a framework that uses pseudo captions from CLIP and image regularization to train a generator, resulting in high-fidelity shapes without additional optimization during inference.

In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, we present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions. Specifically, based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates. Our constructed captions provide high-level semantic supervision for generated 3D shapes. Further, in order to produce fine-grained textures and increase geometry diversity, we propose to adopt low-level image regularization to enable fake-rendered images to align with the real ones. During the inference phase, our proposed model can generate 3D textured shapes from the given text without any additional optimization. We conduct extensive experiments to analyze each of our proposed components and show the efficacy of our framework in generating high-fidelity 3D textured and text-relevant shapes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes