CVFeb 4, 2016

Leveraging Mid-Level Deep Representations For Predicting Face Attributes in the Wild

Yang Zhong, Josephine Sullivan, Haibo Li

arXiv:1602.01827v38.443 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of robust facial attribute prediction for computer vision applications, but it is incremental as it builds on existing CNN methods by focusing on feature representation selection.

The paper tackled the challenge of predicting facial attributes from faces in the wild by proposing the use of mid-level CNN features instead of high-level ones, achieving state-of-the-art prediction accuracy on CelebA and LFWA datasets.

Predicting facial attributes from faces in the wild is very challenging due to pose and lighting variations in the real world. The key to this problem is to build proper feature representations to cope with these unfavourable conditions. Given the success of Convolutional Neural Network (CNN) in image classification, the high-level CNN feature, as an intuitive and reasonable choice, has been widely utilized for this problem. In this paper, however, we consider the mid-level CNN features as an alternative to the high-level ones for attribute prediction. This is based on the observation that face attributes are different: some of them are locally oriented while others are globally defined. Our investigations reveal that the mid-level deep representations outperform the prediction accuracy achieved by the (fine-tuned) high-level abstractions. We empirically demonstrate that the midlevel representations achieve state-of-the-art prediction performance on CelebA and LFWA datasets. Our investigations also show that by utilizing the mid-level representations one can employ a single deep network to achieve both face recognition and attribute prediction.

View on arXiv PDF

Similar