CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
This work addresses 3D fidelity issues in text-to-3D generation, which is important for applications in graphics and AI, but it is incremental as it builds on existing multi-view diffusion and NeRF methods.
The paper tackles the problem of poor 3D geometric fidelity in text-to-3D models, such as unreasonable concavities, by proposing CorrespondentDream, which uses cross-view correspondences from diffusion models to improve NeRF optimization, resulting in more coherent and smoothed object surfaces.
Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavities. In this work, we propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results and a solid user study.