CVOct 16, 2024

TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt

arXiv:2410.21299v26 citationsh-index: 6IEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in customized 3D generation for applications requiring visual and textual inputs, representing an incremental improvement over existing methods.

The paper tackles the problem of low-quality text-to-3D generation with multi-condition inputs like text and visual prompts by proposing a new algorithm, Classifier Score Matching (CSM), which replaces Score Distillation Sampling (SDS) to reduce noise and deviations, resulting in stable, high-quality customized 3D generation as demonstrated in experiments.

In recent years, advancements in generative models have significantly expanded the capabilities of text-to-3D generation. Many approaches rely on Score Distillation Sampling (SDS) technology. However, SDS struggles to accommodate multi-condition inputs, such as text and visual prompts, in customized generation tasks. To explore the core reasons, we decompose SDS into a difference term and a classifier-free guidance term. Our analysis identifies the core issue as arising from the difference term and the random noise addition during the optimization process, both contributing to deviations from the target mode during distillation. To address this, we propose a novel algorithm, Classifier Score Matching (CSM), which removes the difference term in SDS and uses a deterministic noise addition process to reduce noise during optimization, effectively overcoming the low-quality limitations of SDS in our customized generation framework. Based on CSM, we integrate visual prompt information with an attention fusion mechanism and sampling guidance techniques, forming the Visual Prompt CSM (VPCSM) algorithm. Furthermore, we introduce a Semantic-Geometry Calibration (SGC) module to enhance quality through improved textual information integration. We present our approach as TV-3DG, with extensive experiments demonstrating its capability to achieve stable, high-quality, customized 3D generation. Project page: \url{https://yjhboy.github.io/TV-3DG}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes