GhostVLAD for set-based face recognition
This work addresses template-based face recognition, a domain-specific problem, with incremental improvements in handling image quality and representation efficiency.
The paper tackles the problem of learning compact representations for image sets in template-based face recognition, proposing a GhostVLAD layer with ghost clusters that automatically weights input face quality and achieves state-of-the-art performance on the IJB-B dataset.
The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes {\em ghost clusters}, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network's ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols.