CVNov 3, 2023

ProS: Facial Omni-Representation Learning via Prototype-based Self-Distillation

arXiv:2311.01929v27 citationsh-index: 14
Originality Incremental advance
AI Analysis

It addresses data collection and privacy challenges in facial analysis by enabling effective learning without annotated data, though it is incremental as it builds on self-distillation and prototype techniques.

This paper tackles the problem of unsupervised face representation learning by proposing ProS, a prototype-based self-distillation method that leverages unlabeled face images, achieving state-of-the-art performance on tasks like attribute estimation and expression recognition in both full and few-shot settings.

This paper presents a novel approach, called Prototype-based Self-Distillation (ProS), for unsupervised face representation learning. The existing supervised methods heavily rely on a large amount of annotated training facial data, which poses challenges in terms of data collection and privacy concerns. To address these issues, we propose ProS, which leverages a vast collection of unlabeled face images to learn a comprehensive facial omni-representation. In particular, ProS consists of two vision-transformers (teacher and student models) that are trained with different augmented images (cropping, blurring, coloring, etc.). Besides, we build a face-aware retrieval system along with augmentations to obtain the curated images comprising predominantly facial areas. To enhance the discrimination of learned features, we introduce a prototype-based matching loss that aligns the similarity distributions between features (teacher or student) and a set of learnable prototypes. After pre-training, the teacher vision transformer serves as a backbone for downstream tasks, including attribute estimation, expression recognition, and landmark alignment, achieved through simple fine-tuning with additional layers. Extensive experiments demonstrate that our method achieves state-of-the-art performance on various tasks, both in full and few-shot settings. Furthermore, we investigate pre-training with synthetic face images, and ProS exhibits promising performance in this scenario as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes