CVJun 15, 2024

A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing

Ming Meng, Yufei Zhao, Bo Zhang, Yonggui Zhu, Weimin Shi, Maxwell Wen, Zhaoxin Fan

arXiv:2406.10553v27.66 citations

Originality Synthesis-oriented

AI Analysis

It offers a comprehensive framework for researchers in virtual reality, augmented reality, and game production, but is incremental as it synthesizes existing work.

This survey systematically reviews talking head synthesis, categorizing it into portrait generation, driving mechanisms, and editing techniques, and provides a performance analysis based on datasets and evaluation metrics to support future research.

Talking head synthesis, an advanced method for generating portrait videos from a still image driven by specific content, has garnered widespread attention in virtual reality, augmented reality and game production. Recently, significant breakthroughs have been made with the introduction of novel models such as the transformer and the diffusion model. Current methods can not only generate new content but also edit the generated material. This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques. We summarize milestone studies and critically analyze their innovations and shortcomings within each domain. Additionally, we organize an extensive collection of datasets and provide a thorough performance analysis of current methodologies based on various evaluation metrics, aiming to furnish a clear framework and robust data support for future research. Finally, we explore application scenarios of talking head synthesis, illustrate them with specific cases, and examine potential future directions.

View on arXiv PDF

Similar