Frankenstein: Learning Deep Face Representations using Small Data
This addresses a data scarcity issue for researchers and practitioners in face recognition, particularly for NIR applications, by enabling effective training with small datasets, though it is incremental as it builds on existing synthetic data generation methods.
The paper tackles the problem of training deep face recognition models when large labeled datasets are unavailable, such as in near-infrared (NIR) face recognition, by generating synthetic training images through compositing real faces. The result is that models trained on as few as 10,000 synthetic images perform comparably to those trained on 500,000 real images, achieving state-of-the-art results on the CASIA NIR-VIS2.0 dataset.
Deep convolutional neural networks have recently proven extremely effective for difficult face recognition problems in uncontrolled settings. To train such networks, very large training sets are needed with millions of labeled images. For some applications, such as near-infrared (NIR) face recognition, such large training datasets are not publicly available and difficult to collect. In this work, we propose a method to generate very large training datasets of synthetic images by compositing real face images in a given dataset. We show that this method enables to learn models from as few as 10,000 training images, which perform on par with models trained from 500,000 images. Using our approach we also obtain state-of-the-art results on the CASIA NIR-VIS2.0 heterogeneous face recognition dataset.