CVLGSDASNov 29, 2024

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

arXiv:2411.19509v337 citationsh-index: 6MM
Originality Incremental advance
AI Analysis

This work addresses the problem of interactive applications like AI assistants needing real-time, controllable talking head synthesis, representing an incremental improvement over existing diffusion methods.

The paper tackles the slow inference speed and insufficient control in diffusion-based talking head synthesis by proposing Ditto, a framework that achieves fine-grained controls and real-time inference, generating compelling videos with demonstrated superiority in controllability and real-time performance.

Recent advances in diffusion models have endowed talking head synthesis with subtle expressions and vivid head movements, but have also led to slow inference speed and insufficient control over generated results. To address these issues, we propose Ditto, a diffusion-based talking head framework that enables fine-grained controls and real-time inference. Specifically, we utilize an off-the-shelf motion extractor and devise a diffusion transformer to generate representations in a specific motion space. We optimize the model architecture and training strategy to address the issues in generating motion representations, including insufficient disentanglement between motion and identity, and large internal discrepancies within the representation. Besides, we employ diverse conditional signals while establishing a mapping between motion representation and facial semantics, enabling control over the generation process and correction of the results. Moreover, we jointly optimize the holistic framework to enable streaming processing, real-time inference, and low first-frame delay, offering functionalities crucial for interactive applications such as AI assistants. Extensive experimental results demonstrate that Ditto generates compelling talking head videos and exhibits superiority in both controllability and real-time performance.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes