CV AI GR LGSep 1, 2022

Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets

Kristofer Schlachter, Benjamin Ahlbrand, Zhu Wang, Valerio Ortenzi, Ken Perlin

arXiv:2209.00682v17.38 citationsh-index: 35

Originality Incremental advance

AI Analysis

This addresses the need for easier 3D content creation for artists and designers by improving retrieval control, though it appears incremental as it builds on existing CLIP-based methods.

The paper tackles the problem of creating 3D content by enabling high-quality 3D asset retrieval from multi-modal inputs like sketches, images, and text, using CLIP for latent features and multi-modality fusion to enhance artistic control, resulting in a method that allows conditional retrieval from a 3D database with exploration of feature combinations and weighting.

When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.

View on arXiv PDF

Similar