CVNov 25, 2022

Dynamic Neural Portraits

arXiv:2211.13994v11 citationsh-index: 82
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and high-quality video portrait generation for applications like virtual avatars or media production, though it is incremental in improving speed and quality over existing methods.

The paper tackles full-head reenactment by generating photo-realistic video portraits with explicit control over head pose, facial expressions, and eye gaze, achieving 24 fps at 1024x1024 resolution and being 270 times faster than recent NeRF-based methods while outperforming prior works in visual quality.

We present Dynamic Neural Portraits, a novel approach to the problem of full-head reenactment. Our method generates photo-realistic video portraits by explicitly controlling head pose, facial expressions and eye gaze. Our proposed architecture is different from existing methods that rely on GAN-based image-to-image translation networks for transforming renderings of 3D faces into photo-realistic images. Instead, we build our system upon a 2D coordinate-based MLP with controllable dynamics. Our intuition to adopt a 2D-based representation, as opposed to recent 3D NeRF-like systems, stems from the fact that video portraits are captured by monocular stationary cameras, therefore, only a single viewpoint of the scene is available. Primarily, we condition our generative model on expression blendshapes, nonetheless, we show that our system can be successfully driven by audio features as well. Our experiments demonstrate that the proposed method is 270 times faster than recent NeRF-based reenactment methods, with our networks achieving speeds of 24 fps for resolutions up to 1024 x 1024, while outperforming prior works in terms of visual quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes