Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction
This enables on-device facial motion capture for applications like virtual avatars, but is incremental as it builds on existing landmark-based methods.
The paper tackles real-time facial blendshape prediction from monocular images, achieving 30+ FPS on mobile devices with 52 coefficients.
We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time model that predicts blendshape coefficients based on facial landmarks.