Progressive Representative Labeling for Deep Semi-Supervised Learning
This addresses the challenge of leveraging unlabeled data effectively in deep learning for practitioners, though it is incremental as it builds on existing pseudo-labeling approaches.
The paper tackles the problem of improving deep semi-supervised learning by proposing a method to label only the most representative samples, using a graph neural network labeler, which achieves state-of-the-art results including 72.1% top-1 accuracy on ImageNet with 10% labeled data, surpassing the previous best by 3.3%.
Deep semi-supervised learning (SSL) has experienced significant attention in recent years, to leverage a huge amount of unlabeled data to improve the performance of deep learning with limited labeled data. Pseudo-labeling is a popular approach to expand the labeled dataset. However, whether there is a more effective way of labeling remains an open problem. In this paper, we propose to label only the most representative samples to expand the labeled set. Representative samples, selected by indegree of corresponding nodes on a directed k-nearest neighbor (kNN) graph, lie in the k-nearest neighborhood of many other samples. We design a graph neural network (GNN) labeler to label them in a progressive learning manner. Aided by the progressive GNN labeler, our deep SSL approach outperforms state-of-the-art methods on several popular SSL benchmarks including CIFAR-10, SVHN, and ILSVRC-2012. Notably, we achieve 72.1% top-1 accuracy, surpassing the previous best result by 3.3%, on the challenging ImageNet benchmark with only $10\%$ labeled data.