CVSep 14, 2023

DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

arXiv:2309.07752v15.04 citationsh-index: 3

Originality Incremental advance

AI Analysis

This improves photorealistic rendering of talking faces for applications like virtual avatars or video conferencing, representing an incremental advance over existing NeRF-based methods.

The paper tackles high-fidelity talking portrait synthesis by decomposing facial regions into specialized triplanes for the mouth and broader features, achieving state-of-the-art results on key datasets.

In this paper, we present the decomposed triplane-hash neural radiance fields (DT-NeRF), a framework that significantly improves the photorealistic rendering of talking faces and achieves state-of-the-art results on key evaluation datasets. Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features. We introduce audio features as residual terms and integrate them as query vectors into our model through an audio-mouth-face transformer. Additionally, our method leverages the capabilities of Neural Radiance Fields (NeRF) to enrich the volumetric representation of the entire face through additive volumetric rendering techniques. Comprehensive experimental evaluations corroborate the effectiveness and superiority of our proposed approach.

View on arXiv PDF

Similar