Driving Animatronic Robot Facial Expression From Speech
This enables more natural human-robot interaction through lifelike facial expressions, representing a strong specific gain in robotics.
The paper tackled the problem of generating realistic, speech-synchronized facial expressions for animatronic robots by introducing a skinning-centric approach using linear blend skinning, achieving real-time performance at over 4000 fps on a single Nvidia RTX 4090.
Animatronic robots hold the promise of enabling natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions poses significant challenges due to the complexities of facial biomechanics and the need for responsive motion synthesis. This paper introduces a novel, skinning-centric approach to drive animatronic robot facial expressions from speech input. At its core, the proposed approach employs linear blend skinning (LBS) as a unifying representation, guiding innovations in both embodiment design and motion synthesis. LBS informs the actuation topology, facilitates human expression retargeting, and enables efficient speech-driven facial motion generation. This approach demonstrates the capability to produce highly realistic facial expressions on an animatronic face in real-time at over 4000 fps on a single Nvidia RTX 4090, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction. To foster further research and development in this field, the code has been made publicly available at: \url{https://github.com/library87/OpenRoboExp}.