CVAug 3, 2017

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng

arXiv:1708.00980v322.549 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and robust face reconstruction in applications like animation or security, though it is incremental as it builds on existing CNN-based approaches with improved data synthesis.

The paper tackles the problem of detailed 3D face reconstruction from 2D images by introducing a novel data generation method using inverse rendering and detail transfer to create photo-realistic datasets, enabling a coarse-to-fine CNN framework that achieves high-quality reconstruction in real-time with reduced computation compared to state-of-the-art methods.

With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images. The success of CNN-based methods relies on a large number of labeled data. The state-of-the-art synthesizes such data using a coarse morphable face model, which however has difficulty to generate detailed photo-realistic images of faces (with wrinkles). This paper presents a novel face data generation method. Specifically, we render a large number of photo-realistic face images with different attributes based on inverse rendering. Furthermore, we construct a fine-detailed face image dataset by transferring different scales of details from one image to another. We also construct a large number of video-type adjacent frame pairs by simulating the distribution of real video data. With these nicely constructed datasets, we propose a coarse-to-fine learning framework consisting of three convolutional networks. The networks are trained for real-time detailed 3D face reconstruction from monocular video as well as from a single image. Extensive experimental results demonstrate that our framework can produce high-quality reconstruction but with much less computation time compared to the state-of-the-art. Moreover, our method is robust to pose, expression and lighting due to the diversity of data.

View on arXiv PDF

Similar