CV LGMar 3, 2019

Self-Supervised Learning of Face Representations for Video Face Clustering

Vivek Sharma, Makarand Tapaswi, M. Saquib Sarfraz, Rainer Stiefelhagen

arXiv:1903.01000v19.854 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of character identification in videos for media analysis, though it is incremental as it builds on existing deep face models and clustering methods.

The paper tackles video face clustering by proposing a self-supervised Siamese network that distills identity information from pre-trained face representations, achieving state-of-the-art performance on three datasets.

Analyzing the story behind TV series and movies often requires understanding who the characters are and what they are doing. With improving deep face models, this may seem like a solved problem. However, as face detectors get better, clustering/identification needs to be revisited to address increasing diversity in facial appearance. In this paper, we address video face clustering using unsupervised methods. Our emphasis is on distilling the essential information, identity, from the representations obtained using deep pre-trained face networks. We propose a self-supervised Siamese network that can be trained without the need for video/track based supervision, and thus can also be applied to image collections. We evaluate our proposed method on three video face clustering datasets. The experiments show that our methods outperform current state-of-the-art methods on all datasets. Video face clustering is lacking a common benchmark as current works are often evaluated with different metrics and/or different sets of face tracks.

View on arXiv PDF Code

Similar