Do We Really Need to Collect Millions of Faces for Effective Face Recognition?
This addresses the data collection bottleneck for face recognition systems, offering a more accessible alternative to manual labeling, though it is incremental as it builds on existing synthesis methods.
The paper tackles the problem of needing massive labeled face datasets for effective face recognition by proposing to synthesize facial variations instead of collecting more images. The result is that their approach matches state-of-the-art performance on benchmarks like LFW and IJB-A, achieving comparable results without millions of real images.
Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes -- huge numbers of face images downloaded and labeled for identity -- it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible means of increasing training data sizes for face recognition systems. Rather than manually harvesting and labeling more faces, we simply synthesize them. We describe novel methods of enriching an existing dataset with important facial appearance variations by manipulating the faces it contains. We further apply this synthesis approach when matching query images represented using a standard convolutional neural network. The effect of training and testing with synthesized images is extensively tested on the LFW and IJB-A (verification and identification) benchmarks and Janus CS2. The performances obtained by our approach match state of the art results reported by systems trained on millions of downloaded images.