CVDec 20, 2021

3D-aware Image Synthesis via Learning Structural and Textural Representations

arXiv:2112.10759v2142 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of generating high-fidelity 3D-aware images for computer vision and graphics applications, representing an incremental improvement over existing GAN-NeRF hybrids.

The paper tackles the challenge of 3D-aware image synthesis by proposing VolumeGAN, which learns separate structural and textural representations to improve global structure awareness and efficiency, achieving higher image quality and better 3D control than previous methods.

Making generative models 3D-aware bridges the 2D image space and the 3D physical world yet remains challenging. Recent attempts equip a Generative Adversarial Network (GAN) with a Neural Radiance Field (NeRF), which maps 3D coordinates to pixel values, as a 3D prior. However, the implicit function in NeRF has a very local receptive field, making the generator hard to become aware of the global structure. Meanwhile, NeRF is built on volume rendering which can be too costly to produce high-resolution results, increasing the optimization difficulty. To alleviate these two problems, we propose a novel framework, termed as VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation. We first learn a feature volume to represent the underlying structure, which is then converted to a feature field using a NeRF-like model. The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis. Such a design enables independent control of the shape and the appearance. Extensive experiments on a wide range of datasets show that our approach achieves sufficiently higher image quality and better 3D control than the previous methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes