NeRFEditor: Differentiable Style Decomposition for Full 3D Scene Editing
This addresses the challenge of high-quality, identity-preserving 3D scene editing for applications in computer graphics and vision, though it appears incremental as it builds on existing NeRF and StyleGAN methods.
The paper tackles the problem of 3D scene editing by introducing NeRFEditor, a framework that uses mutual learning between StyleGAN and NeRF to enable diverse editing types like reference images, text prompts, and user interactions, resulting in better editability, fidelity, and identity preservation compared to prior work.
We present NeRFEditor, an efficient learning framework for 3D scene editing, which takes a video captured over 360° as input and outputs a high-quality, identity-preserving stylized 3D scene. Our method supports diverse types of editing such as guided by reference images, text prompts, and user interactions. We achieve this by encouraging a pre-trained StyleGAN model and a NeRF model to learn from each other mutually. Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle. To extrapolate editing to GAN out-of-domain views, we devise another module that is trained in a self-supervised learning manner. This module maps novel-view images to the hidden space of StyleGAN that allows StyleGAN to generate stylized images on novel views. These two modules together produce guided images in 360°views to finetune a NeRF to make stylization effects, where a stable fine-tuning strategy is proposed to achieve this. Experiments show that NeRFEditor outperforms prior work on benchmark and real-world scenes with better editability, fidelity, and identity preservation.