CVApr 16, 2024

CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

arXiv:2404.10603v210 citationsh-index: 10CVPR
Originality Incremental advance
AI Analysis

This work addresses 3D fidelity issues in text-to-3D generation, which is important for applications in graphics and AI, but it is incremental as it builds on existing multi-view diffusion and NeRF methods.

The paper tackles the problem of poor 3D geometric fidelity in text-to-3D models, such as unreasonable concavities, by proposing CorrespondentDream, which uses cross-view correspondences from diffusion models to improve NeRF optimization, resulting in more coherent and smoothed object surfaces.

Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavities. In this work, we propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes