CVDec 15, 2022

MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

Microsoft
arXiv:2212.08062v394 citationsh-index: 54
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and high-quality talking head generation for applications like virtual avatars, though it builds incrementally on prior methods.

The paper tackles the problem of generating realistic talking head videos while preserving identity, achieving high fidelity and enabling fast personalized adaptation in as little as 30 seconds.

In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swapping methods, we adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait. Although the proposed model surpasses prior generation fidelity on established benchmarks, to further make the talking head generation qualified for real usage, personalized fine-tuning is usually needed. However, this process is rather computationally demanding that is unaffordable to standard users. To solve this, we propose a fast adaptation model using a meta-learning approach. The learned model can be adapted to a high-quality personalized model as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement module is proposed to improve the fine details while ensuring temporal coherency. Extensive experiments prove the significant superiority of our approach over the state of the arts in both one-shot and personalized settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes