Deep Energies for Estimating Three-Dimensional Facial Pose and Expression
This addresses the challenge of consistent and efficient facial performance capture for animation and visual effects, though it appears incremental as it builds on existing optimization techniques.
The paper tackles the problem of estimating 3D facial pose and expression by replacing subjective, hand-drawn rotoscope curves with automated detection using neural networks, resulting in the elimination of artist subjectivity and ad-hoc procedures.
While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading, high-end systems typically also rely on rotoscope curves hand-drawn on the image. These curves are subjective and difficult to draw consistently; moreover, ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional facial pose and expression. We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks, eliminating artist subjectivity and the ad-hoc procedures meant to mimic it. More generally, we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to three-dimensional facial pose and expression estimation.