A Proximity-Aware Hierarchical Clustering of Faces
This work addresses the problem of clustering faces in unconstrained datasets for computer vision applications, offering incremental improvements over existing methods.
The paper tackles unsupervised face clustering by proposing PAHC, which uses local deep features and SVM margins for similarity, achieving significant improvements over state-of-the-art methods on datasets like CFP, IJB-A, and JANUS CS3. It also shows that applying PAHC to curate noisy training data, such as MS-Celeb-1M with over three million images, significantly improves face verification performance on JANUS CS3 after finetuning.
In this paper, we propose an unsupervised face clustering algorithm called "Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local structure of deep representations. In the proposed method, a similarity measure between deep features is computed by evaluating linear SVM margins. SVMs are trained using nearest neighbors of sample data, and thus do not require any external training data. Clusters are then formed by thresholding the similarity scores. We evaluate the clustering performance using three challenging unconstrained face datasets, including Celebrity in Frontal-Profile (CFP), IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3) datasets. Experimental results demonstrate that the proposed approach can achieve significant improvements over state-of-the-art methods. Moreover, we also show that the proposed clustering algorithm can be applied to curate a set of large-scale and noisy training dataset while maintaining sufficient amount of images and their variations due to nuisance factors. The face verification performance on JANUS CS3 improves significantly by finetuning a DCNN model with the curated MS-Celeb-1M dataset which contains over three million face images.