CVAug 19, 2016

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

arXiv:1608.05477v2142 citations
AI Analysis

This work addresses the problem of improving accuracy and generalization in face alignment for video applications, representing an incremental advancement with novel method components.

The paper tackles real-time video-based face alignment by proposing a recurrent encoder-decoder network that uses spatial and temporal recurrent learning to predict 2D facial point maps, achieving significantly more accurate results than state-of-the-art methods on standard datasets.

We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to enable iterative coarse-to-fine face alignment using a single network model. At the temporal level, we first decouple the features in the bottleneck of the network into temporal-variant factors, such as pose and expression, and temporal-invariant factors, such as identity information. Temporal recurrent learning is then applied to the decoupled temporal-variant features, yielding better generalization and significantly more accurate results at test time. We perform a comprehensive experimental analysis, showing the importance of each component of our proposed model, as well as superior results over the state-of-the-art in standard datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes