Attention Mesh: High-fidelity Face Mesh Prediction in Real-time
This enables AR makeup, eye tracking, and puppeteering on mobile devices, but it is incremental as it improves speed over existing methods.
The paper tackles the problem of real-time 3D face mesh prediction for AR applications by introducing Attention Mesh, a lightweight architecture that runs at over 50 FPS on a Pixel 2 phone and matches the accuracy of multi-stage cascaded approaches while being 30% faster.
We present Attention Mesh, a lightweight architecture for 3D face mesh prediction that uses attention to semantically meaningful regions. Our neural network is designed for real-time on-device inference and runs at over 50 FPS on a Pixel 2 phone. Our solution enables applications like AR makeup, eye tracking and AR puppeteering that rely on highly accurate landmarks for eye and lips regions. Our main contribution is a unified network architecture that achieves the same accuracy on facial landmarks as a multi-stage cascaded approach, while being 30 percent faster.