CVOct 12, 2023

Consistent123: Improve Consistency for One Image to 3D Object Synthesis

arXiv:2310.08092v190 citationsh-index: 20
AI Analysis

This addresses the issue of limited performance in 3D reconstruction and generation for downstream tasks, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of view inconsistency in image-to-3D synthesis by proposing Consistent123, which improves consistency through cross-view attention and shared self-attention mechanisms, resulting in outperforming baselines in view consistency by a large margin.

Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism. The proposed attention mechanism improves the interaction across all synthesized views, as well as the alignment between the condition view and novel views. In the sampling stage, such architecture supports simultaneously generating an arbitrary number of views while training at a fixed length. We also introduce a progressive classifier-free guidance strategy to achieve the trade-off between texture and geometry for synthesized object views. Qualitative and quantitative experiments show that Consistent123 outperforms baselines in view consistency by a large margin. Furthermore, we demonstrate a significant improvement of Consistent123 on varying downstream tasks, showing its great potential in the 3D generation field. The project page is available at consistent-123.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes