CVAug 30, 2023

From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications

arXiv:2308.16041v110 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

It serves as a reference for researchers and practitioners in computer vision, but is incremental as a survey paper.

This paper provides a comprehensive survey of state-of-the-art methods for talking head generation, categorizing them into approaches like image-driven and audio-driven, and compares models based on inference time and human-rated quality.

Recent advancements in deep learning and computer vision have led to a surge of interest in generating realistic talking heads. This paper presents a comprehensive survey of state-of-the-art methods for talking head generation. We systematically categorises them into four main approaches: image-driven, audio-driven, video-driven and others (including neural radiance fields (NeRF), and 3D-based methods). We provide an in-depth analysis of each method, highlighting their unique contributions, strengths, and limitations. Furthermore, we thoroughly compare publicly available models, evaluating them on key aspects such as inference time and human-rated quality of the generated outputs. Our aim is to provide a clear and concise overview of the current landscape in talking head generation, elucidating the relationships between different approaches and identifying promising directions for future research. This survey will serve as a valuable reference for researchers and practitioners interested in this rapidly evolving field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes