CVMar 3, 2018

Unsupervised Learning of Face Representations

arXiv:1803.01260v122 citations
Originality Incremental advance
AI Analysis

This addresses the problem of learning face representations without manual supervision, particularly for low-resolution faces in surveillance scenarios, representing an incremental improvement over existing methods.

The paper tackles unsupervised learning of face representations by mining training data from videos, using the constraints that faces in the same frame are different persons and tracked faces across frames are the same person. It achieves higher verification accuracy on the LFW benchmark compared to hand-crafted features and surpasses state-of-the-art deep networks like VGG-Face when using low-resolution inputs.

We present an approach for unsupervised training of CNNs in order to learn discriminative face representations. We mine supervised training data by noting that multiple faces in the same video frame must belong to different persons and the same face tracked across multiple frames must belong to the same person. We obtain millions of face pairs from hundreds of videos without using any manual supervision. Although faces extracted from videos have a lower spatial resolution than those which are available as part of standard supervised face datasets such as LFW and CASIA-WebFace, the former represent a much more realistic setting, e.g. in surveillance scenarios where most of the faces detected are very small. We train our CNNs with the relatively low resolution faces extracted from video frames collected, and achieve a higher verification accuracy on the benchmark LFW dataset cf. hand-crafted features such as LBPs, and even surpasses the performance of state-of-the-art deep networks such as VGG-Face, when they are made to work with low resolution input images.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes