CVDec 6, 2022

NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors

arXiv:2212.03267v1188 citationsh-index: 76
Originality Incremental advance
AI Analysis

This addresses the challenge of single-view 3D reconstruction for computer vision applications, offering an incremental improvement by integrating diffusion models and language guidance.

The paper tackles the ill-posed problem of 2D-to-3D reconstruction by proposing NeRDi, a framework that uses language-guided diffusion priors to synthesize NeRFs from single-view images, achieving higher quality novel views on the DTU MVS dataset and demonstrating zero-shot generalizability for in-the-wild images.

2D-to-3D reconstruction is an ill-posed problem, yet humans are good at solving this problem due to their prior knowledge of the 3D world developed over years. Driven by this observation, we propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models. Formulating single-view reconstruction as an image-conditioned 3D generation problem, we optimize the NeRF representations by minimizing a diffusion loss on its arbitrary view renderings with a pretrained image diffusion model under the input-view constraint. We leverage off-the-shelf vision-language models and introduce a two-section language guidance as conditioning inputs to the diffusion model. This is essentially helpful for improving multiview content coherence as it narrows down the general image prior conditioned on the semantic and visual features of the single-view input image. Additionally, we introduce a geometric loss based on estimated depth maps to regularize the underlying 3D geometry of the NeRF. Experimental results on the DTU MVS dataset show that our method can synthesize novel views with higher quality even compared to existing methods trained on this dataset. We also demonstrate our generalizability in zero-shot NeRF synthesis for in-the-wild images.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes