SDAIIVSep 21, 2025

PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control

arXiv:2509.16922v1h-index: 3ICONIP
Originality Incremental advance
AI Analysis

This work addresses the need for real-time, high-fidelity talking head synthesis for applications like virtual reality and digital avatars, representing an incremental improvement over prior 3DGS-based methods.

The paper tackled the problem of low rendering efficiency and suboptimal audio-visual synchronization in audio-driven talking head generation by proposing PGSTalker, a framework based on 3D Gaussian Splatting with pixel-aware density control and a multimodal fusion module, resulting in outperforming existing methods in rendering quality, lip-sync precision, and inference speed.

Audio-driven talking head generation is crucial for applications in virtual reality, digital avatars, and film production. While NeRF-based methods enable high-fidelity reconstruction, they suffer from low rendering efficiency and suboptimal audio-visual synchronization. This work presents PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS). To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere. Additionally, we introduce a lightweight Multimodal Gated Fusion Module to effectively fuse audio and spatial features, thereby improving the accuracy of Gaussian deformation prediction. Extensive experiments on public datasets demonstrate that PGSTalker outperforms existing NeRF- and 3DGS-based approaches in rendering quality, lip-sync precision, and inference speed. Our method exhibits strong generalization capabilities and practical potential for real-world deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes