CVFeb 4, 2016

Leveraging Mid-Level Deep Representations For Predicting Face Attributes in the Wild

arXiv:1602.01827v343 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of robust facial attribute prediction for computer vision applications, but it is incremental as it builds on existing CNN methods by focusing on feature representation selection.

The paper tackled the challenge of predicting facial attributes from faces in the wild by proposing the use of mid-level CNN features instead of high-level ones, achieving state-of-the-art prediction accuracy on CelebA and LFWA datasets.

Predicting facial attributes from faces in the wild is very challenging due to pose and lighting variations in the real world. The key to this problem is to build proper feature representations to cope with these unfavourable conditions. Given the success of Convolutional Neural Network (CNN) in image classification, the high-level CNN feature, as an intuitive and reasonable choice, has been widely utilized for this problem. In this paper, however, we consider the mid-level CNN features as an alternative to the high-level ones for attribute prediction. This is based on the observation that face attributes are different: some of them are locally oriented while others are globally defined. Our investigations reveal that the mid-level deep representations outperform the prediction accuracy achieved by the (fine-tuned) high-level abstractions. We empirically demonstrate that the midlevel representations achieve state-of-the-art prediction performance on CelebA and LFWA datasets. Our investigations also show that by utilizing the mid-level representations one can employ a single deep network to achieve both face recognition and attribute prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes