CVJun 17, 2021

Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

arXiv:2106.09614v334 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust face reconstruction for computer vision applications by handling variable outliers without costly annotations, though it is incremental as it builds on existing model-based methods.

The paper tackles the problem of improving model-based 3D face reconstruction by avoiding fitting to outliers like occluders or makeup, using a joint face-autoencoder and outlier segmentation approach (FOCUS) that resolves mutual dependencies through EM-type training. It achieves state-of-the-art reconstruction on the NoW testset without 3D annotations and accurately localizes occluders on datasets like CelebA-HQ and AR without segmentation labels.

In this work, we aim to enhance model-based face reconstruction by avoiding fitting the model to outliers, i.e. regions that cannot be well-expressed by the model such as occluders or make-up. The core challenge for localizing outliers is that they are highly variable and difficult to annotate. To overcome this challenging problem, we introduce a joint Face-autoencoder and outlier segmentation approach (FOCUS).In particular, we exploit the fact that the outliers cannot be fitted well by the face model and hence can be localized well given a high-quality model fitting. The main challenge is that the model fitting and the outlier segmentation are mutually dependent on each other, and need to be inferred jointly. We resolve this chicken-and-egg problem with an EM-type training strategy, where a face autoencoder is trained jointly with an outlier segmentation network. This leads to a synergistic effect, in which the segmentation network prevents the face encoder from fitting to the outliers, enhancing the reconstruction quality. The improved 3D face reconstruction, in turn, enables the segmentation network to better predict the outliers. To resolve the ambiguity between outliers and regions that are difficult to fit, such as eyebrows, we build a statistical prior from synthetic data that measures the systematic bias in model fitting. Experiments on the NoW testset demonstrate that FOCUS achieves SOTA 3D face reconstruction performance among all baselines that are trained without 3D annotation. Moreover, our results on CelebA-HQ and the AR database show that the segmentation network can localize occluders accurately despite being trained without any segmentation annotation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes