Merge or Not? Learning to Group Faces via Imitation Learning
This work addresses the challenge of accurately clustering face images into identities, particularly for profile faces and noisy data, which is incremental as it builds on existing clustering methods by introducing a dynamic decision-making approach.
The paper tackles the problem of face grouping from unlabeled images by formulating a novel framework that uses imitation learning via inverse reinforcement learning to make sequential merge decisions, achieving superior performance over existing baselines on three benchmark datasets.
Given a large number of unlabeled face images, face grouping aims at clustering the images into individual identities present in the data. This task remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a.k.a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines.