3D-Aware Video Generation
This work addresses the challenge of creating consistent 3D video content for applications in media and simulation, representing an incremental advancement in generative models.
The paper tackles the problem of generating 3D-aware videos by developing a 4D GAN framework that synthesizes such videos using only monocular video supervision, achieving quality comparable to existing 3D or video GANs.
Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.