CVAIApr 3, 2025

ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation

arXiv:2504.02316v11 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses inconsistent 3D generation for content creators, though it is incremental as it builds on existing score distillation frameworks.

The paper tackled the multi-face Janus problem in zero-shot text-to-3D generation caused by view biases in text-to-image models, proposing ConsDreamer to refine score distillation, which outperformed existing methods in visual quality and consistency.

Recent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel framework that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise camera parameters; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer effectively mitigates the multi-face Janus problem in text-to-3D generation, outperforming existing methods in both visual quality and consistency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes