CVJan 7, 2021

PVA: Pixel-aligned Volumetric Avatars

Amit Raj, Michael Zollhoefer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, Stephen Lombardi

arXiv:2101.02697v125.151 citations

Originality Highly original

AI Analysis

This research is significant for improving the quality and generalizability of virtual telepresence systems for a broad range of users, offering an incremental advancement in volumetric avatar technology.

This paper addresses the challenge of creating photo-realistic human head avatars for virtual telepresence, specifically focusing on generalizing volumetric models across multiple identities. The authors propose a novel parameterization combining neural radiance fields with local, pixel-aligned features, enabling the model to outperform existing state-of-the-art methods in quality and generate faithful facial expressions in a multi-identity setting.

Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. Currently, the highest quality is achieved by volumetric approaches trained in a person specific manner on multi-view data. These models better represent fine structure, such as hair, compared to simpler mesh-based models. Volumetric models typically employ a global code to represent facial expressions, such that they can be driven by a small set of animation parameters. While such architectures achieve impressive rendering quality, they can not easily be extended to the multi-identity setting. In this paper, we devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs. We enable generalization across identities by a novel parameterization that combines neural radiance fields with local, pixel-aligned features extracted directly from the inputs, thus sidestepping the need for very deep or complex networks. Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.We demonstrate that our approach outperforms the existing state of the art in terms of quality and is able to generate faithful facial expressions in a multi-identity setting.

View on arXiv PDF

Similar