Towards Universal Representation Learning for Deep Face Recognition
This addresses the challenge of wild face recognition for applications like security and surveillance, but it is incremental as it builds on existing representation learning with novel training techniques.
The paper tackles the problem of recognizing faces with extreme variations by proposing a universal representation learning framework that synthesizes training data with variations like low resolution and occlusion, and uses split feature embeddings with confidence values and decorrelation losses. The method achieves top performance on general datasets like LFW and MegaFace, and significantly better results on extreme benchmarks such as TinyFace and IJB-S.
Recognizing wild faces is extremely hard as they appear with all kinds of variations. Traditional methods either train with specifically annotated variation data from target domains, or by introducing unlabeled target variation data to adapt from the training data. Instead, we propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge. We firstly synthesize training data alongside some semantically meaningful variations, such as low resolution, occlusion and head pose. However, directly feeding the augmented data for training will not converge well as the newly introduced samples are mostly hard examples. We propose to split the feature embedding into multiple sub-embeddings, and associate different confidence values for each sub-embedding to smooth the training procedure. The sub-embeddings are further decorrelated by regularizing variation classification loss and variation adversarial loss on different partitions of them. Experiments show that our method achieves top performance on general face recognition datasets such as LFW and MegaFace, while significantly better on extreme benchmarks such as TinyFace and IJB-S.