CVMay 3, 2025

GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting

arXiv:2505.01928v12 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses the need for efficient multi-identity lip-sync in video synthesis, reducing computational overhead for applications like virtual avatars or dubbing.

The paper tackled the problem of synthesizing lip-synced videos for multiple speakers by introducing GenSync, a framework that uses 3D Gaussian Splatting and a Disentanglement Module to separate identity-specific features from audio, achieving 6.8x faster training than state-of-the-art models while maintaining high accuracy and quality.

We introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio representations, enabling efficient multi-identity video synthesis. This design reduces computational overhead and achieves 6.8x faster training compared to state-of-the-art models, while maintaining high lip-sync accuracy and visual quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes