CVGRApr 22, 2022

Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic Images in Facial Capture Pipelines

arXiv:2204.10746v25 citationsh-index: 62
Originality Incremental advance
AI Analysis

This addresses the challenge of facial capture for applications like animation and VR by enabling robust tracking from low-quality video, though it is incremental in leveraging existing deepfake methods.

The paper tackles the problem of building and tracking 3D facial models from in-the-wild video data by using deepfake technology to bridge the synthetic-to-real domain gap, resulting in a pipeline that avoids the need for real-world ground truth or high-end capture setups.

We propose an end-to-end pipeline for both building and tracking 3D facial models from personalized in-the-wild (cellphone, webcam, youtube clips, etc.) video data. First, we present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision detection algorithms in traditional computer graphics pipelines. Subsequently, we utilize synthetic turntables and leverage deepfake technology in order to build a synthetic multi-view stereo pipeline for appearance capture that is robust to imperfect synthetic geometry and image misalignment. The resulting model is fit with an animation rig, which is then used to track facial performances. Notably, our novel use of deepfake technology enables us to perform robust tracking of in-the-wild data using differentiable renderers despite a significant synthetic-to-real domain gap. Finally, we outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data and/or a high-end calibrated camera capture setup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes