CVAug 21, 2018

Self-supervised learning of a facial attribute embedding from video

arXiv:1808.06882v1140 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient facial attribute learning in computer vision, offering an incremental improvement over existing self-supervised techniques.

The paper tackles the problem of learning facial attributes without labeled data by proposing a self-supervised framework that embeds video frames into a common space, achieving performance comparable to or better than state-of-the-art self-supervised methods and approaching supervised methods.

We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, i.e. facial attributes, without having been supervised with any labelled data. We are comparable or superior to state-of-the-art self-supervised methods on these tasks and approach the performance of supervised methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes