Fake It Till You Make It: Face analysis in the wild using synthetic data alone
This enables face-related computer vision in the wild without manual labeling, addressing a bottleneck for researchers and practitioners in fields like surveillance or entertainment.
The paper tackles the domain gap between synthetic and real data for face analysis by generating highly realistic synthetic training data, achieving accuracy comparable to real data on tasks like landmark localization and face parsing.
We demonstrate that it is possible to perform face-related computer vision in the wild using synthetic data alone. The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried to bridge this gap with data mixing, domain adaptation, and domain-adversarial training, but we show that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets. We describe how to combine a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity. We train machine learning systems for face-related tasks such as landmark localization and face parsing, showing that synthetic data can both match real data in accuracy as well as open up new approaches where manual labelling would be impossible.