GVP: Generative Volumetric Primitives
This addresses the challenge of high-resolution 3D-aware image generation for applications like computer graphics and virtual reality, representing a novel advancement rather than an incremental improvement.
The paper tackles the problem of designing a pure 3D generative model for high-resolution image synthesis without compromising multiview consistency, achieving real-time sampling and rendering of 512-resolution images with superior efficiency and 3D consistency over state-of-the-art methods.
Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.