CVAIMar 23, 2023

TriPlaneNet: An Encoder for EG3D Inversion

arXiv:2303.13497v256 citationsh-index: 86
Originality Incremental advance
AI Analysis

This addresses the need for efficient 3D face reconstruction and editing, though it is incremental as it builds on existing EG3D and encoder frameworks.

The paper tackles the problem of fast 3D GAN inversion for face images by introducing TriPlaneNet, an encoder-based method that directly predicts tri-plane offsets for the EG3D model, achieving renderings similar to optimization-based techniques while being significantly faster.

Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering. At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view, whereas optimization-based 3D GAN inversion methods are time-consuming and can require at least several minutes per image. Fast encoder-based techniques, such as those developed for StyleGAN, may also be less appealing due to the lack of identity preservation. Our work introduces a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model. In particular, we build upon a feed-forward convolutional encoder for the latent code and extend it with a fully-convolutional predictor of tri-plane numerical offsets. The renderings are similar in quality to the ones produced by optimization-based techniques and outperform the ones by encoder-based methods. As we empirically prove, this is a consequence of directly operating in the tri-plane space, not in the GAN parameter space, while making use of an encoder-based trainable approach. Finally, we demonstrate significantly more correct embedding of a face image in 3D than for all the baselines, further strengthened by a probably symmetric prior enabled during training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes