CVOct 29, 2024

OFER: Occluded Face Expression Reconstruction

Pratheba Selvaraju, Victoria Fernandez Abrevaya, Timo Bolkart, Rick Akkerman, Tianyu Ding, Faezeh Amjadi, Ilya Zharkov

arXiv:2410.21629v23.71 citationsh-index: 11CVPR

Originality Incremental advance

AI Analysis

This addresses the challenge of occluded face reconstruction for computer vision applications, but it is incremental as it builds on existing parametric models and diffusion techniques.

The paper tackles the problem of reconstructing 3D faces from single images with occlusions by introducing OFER, which uses diffusion models to generate diverse shape and expression coefficients, resulting in improved performance over existing occlusion-based methods and enabling diverse expression generation.

Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to generate the shape and expression coefficients of a face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. However, to maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores. We evaluate our method using standard benchmarks and introduce CO-545, a new protocol and dataset designed to assess the accuracy of expressive faces under occlusion. Our results show improved performance over occlusion-based methods, while also enabling the generation of diverse expressions for a given image.

View on arXiv PDF

Similar