An Empirical Study of Recent Face Alignment Methods
This work addresses reproducibility and fairness issues in face alignment research, providing practical guidance for the computer vision community, though it is incremental as it focuses on evaluation rather than proposing new methods.
The paper tackled the problem of inconsistent and hard-to-reproduce evaluations in face alignment methods by conducting a rigorous empirical study, introducing a new metric (AUC$_α$) and extending the 300W dataset for fair comparisons, leading to insights not previously available from original publications.
The problem of face alignment has been intensively studied in the past years. A large number of novel methods have been proposed and reported very good performance on benchmark dataset such as 300W. However, the differences in the experimental setting and evaluation metric, missing details in the description of the methods make it hard to reproduce the results reported and evaluate the relative merits. For instance, most recent face alignment methods are built on top of face detection but from different face detectors. In this paper, we carry out a rigorous evaluation of these methods by making the following contributions: 1) we proposes a new evaluation metric for face alignment on a set of images, i.e., area under error distribution curve within a threshold, AUC$_α$, given the fact that the traditional evaluation measure (mean error) is very sensitive to big alignment error. 2) we extend the 300W database with more practical face detections to make fair comparison possible. 3) we carry out face alignment sensitivity analysis w.r.t. face detection, on both synthetic and real data, using both off-the-shelf and re-retrained models. 4) we study factors that are particularly important to achieve good performance and provide suggestions for practical applications. Most of the conclusions drawn from our comparative analysis cannot be inferred from the original publications.