Deploying Speech-Driven 3D Facial Animation in Unreal Engine for Production-Ready Digital Humans
This work addresses the gap between academic research and production pipelines for digital human animation, but the contribution is incremental as it adapts existing methods to a specific engine.
The authors present a deployable system for speech-driven 3D facial animation in Unreal Engine using ARKit-compatible blendshapes, converting the MEAD corpus to create the 3DMEAD-ARKit dataset and retraining models for stochastic and emotion-controllable animations. A perceptual user study shows their system is competitive with commercial tools like MetaHuman and Audio2Face.
Speech-driven 3D facial animation research has shown promising results, but most methods rely on representations that are not compatible with production pipelines. In this work, we present a deployable system that bridges this gap by enabling speech-driven 3D facial animation directly in Unreal Engine (UE) using ARKit-compatible representations. We construct 3DMEAD-ARKit dataset by converting the MEAD corpus into blendshape sequences using MediaPipe, and retrain FaceDiffuser and ProbTalk3D-X to generate stochastic and emotion controllable animations. We further develop a modular UE plugin with a Python backend that supports model selection, and parameter control. We compare the results to two existing commercial tools: Epic Games' MetaHuman speech-driven animator and Nvidia Audio2Face with a perceptual user study. The results highlight the importance of comparisons among academic and commercial pipelines. We recommend watching the supplementary video. We also plan to do live demonstrations of our work at Siggraph 2026 conference.