Designing a 3D-Aware StyleNeRF Encoder for Face Editing
This work addresses the challenge of 3D inconsistency in face manipulation tasks for applications like video editing, though it is incremental as it builds on existing 3D-aware GAN inversion methods.
The paper tackles the problem of 3D-consistent face editing by proposing a 3D-aware encoder for GAN inversion based on StyleNeRF, achieving improved multi-view and temporal consistency in facial attribute editing and texture transfer.
GAN inversion has been exploited in many face manipulation tasks, but 2D GANs often fail to generate multi-view 3D consistent images. The encoders designed for 2D GANs are not able to provide sufficient 3D information for the inversion and editing. Therefore, 3D-aware GAN inversion is proposed to increase the 3D editing capability of GANs. However, the 3D-aware GAN inversion remains under-explored. To tackle this problem, we propose a 3D-aware (3Da) encoder for GAN inversion and face editing based on the powerful StyleNeRF model. Our proposed 3Da encoder combines a parametric 3D face model with a learnable detail representation model to generate geometry, texture and view direction codes. For more flexible face manipulation, we then design a dual-branch StyleFlow module to transfer the StyleNeRF codes with disentangled geometry and texture flows. Extensive experiments demonstrate that we realize 3D consistent face manipulation in both facial attribute editing and texture transfer. Furthermore, for video editing, we make the sequence of frame codes share a common canonical manifold, which improves the temporal consistency of the edited attributes.