Semi-supervised deep learning based on label propagation in a 2D embedded space
This addresses the labor-intensive need for large labeled datasets in image classification, offering a practical solution for domains with scarce labeled data, though it is incremental as it builds on existing semi-supervised and embedding techniques.
The paper tackles the problem of training deep neural networks with limited labeled data by proposing a semi-supervised loop that uses t-SNE and Optimum-Path Forest for label propagation in a 2D embedded space, resulting in significant classification improvements on test data using only 1% to 5% of supervised samples across five datasets.
While convolutional neural networks need large labeled sets for training images, expert human supervision of such datasets can be very laborious. Proposed solutions propagate labels from a small set of supervised images to a large set of unsupervised ones to obtain sufficient truly-and-artificially labeled samples to train a deep neural network model. Yet, such solutions need many supervised images for validation. We present a loop in which a deep neural network (VGG-16) is trained from a set with more correctly labeled samples along iterations, created by using t-SNE to project the features of its last max-pooling layer into a 2D embedded space in which labels are propagated using the Optimum-Path Forest semi-supervised classifier. As the labeled set improves along iterations, it improves the features of the neural network. We show that this can significantly improve classification results on test data (using only 1\% to 5\% of supervised samples) of three private challenging datasets and two public ones.