MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
This addresses the challenge of view-inconsistent and geometrically misaligned inpainting for 3D scene reconstruction, representing an incremental improvement over existing NeRF inpainting techniques.
The paper tackles the problem of inconsistent and low-quality 3D inpainting in NeRF scenes by proposing MVIP-NeRF, which uses diffusion priors and multi-view joint optimization to achieve better appearance and geometry recovery compared to previous methods.
Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry completion and alignment with inpainted RGB images. To overcome these limitations, we propose a novel approach called MVIP-NeRF that harnesses the potential of diffusion priors for NeRF inpainting, addressing both appearance and geometry aspects. MVIP-NeRF performs joint inpainting across multiple views to reach a consistent solution, which is achieved via an iterative optimization process based on Score Distillation Sampling (SDS). Apart from recovering the rendered RGB images, we also extract normal maps as a geometric representation and define a normal SDS loss that motivates accurate geometry inpainting and alignment with the appearance. Additionally, we formulate a multi-view SDS score function to distill generative priors simultaneously from different view images, ensuring consistent visual completion when dealing with large view variations. Our experimental results show better appearance and geometry recovery than previous NeRF inpainting methods.