CVLGMar 3, 2019

Self-Supervised Learning of Face Representations for Video Face Clustering

arXiv:1903.01000v154 citations
Originality Incremental advance
AI Analysis

This addresses the problem of character identification in videos for media analysis, though it is incremental as it builds on existing deep face models and clustering methods.

The paper tackles video face clustering by proposing a self-supervised Siamese network that distills identity information from pre-trained face representations, achieving state-of-the-art performance on three datasets.

Analyzing the story behind TV series and movies often requires understanding who the characters are and what they are doing. With improving deep face models, this may seem like a solved problem. However, as face detectors get better, clustering/identification needs to be revisited to address increasing diversity in facial appearance. In this paper, we address video face clustering using unsupervised methods. Our emphasis is on distilling the essential information, identity, from the representations obtained using deep pre-trained face networks. We propose a self-supervised Siamese network that can be trained without the need for video/track based supervision, and thus can also be applied to image collections. We evaluate our proposed method on three video face clustering datasets. The experiments show that our methods outperform current state-of-the-art methods on all datasets. Video face clustering is lacking a common benchmark as current works are often evaluated with different metrics and/or different sets of face tracks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes