Improved Training for Self-Training by Confidence Assessments
This work addresses the challenge of expensive data labeling in tasks like semantic segmentation, though it appears incremental as it builds on existing self-training techniques.
The paper tackles the problem of insufficient labeled training data by proposing a self-training method that uses high-confidence predictions on unlabeled data as pseudo-labels for online learning. It demonstrates this approach on MNIST for classification and ADE20K for semantic segmentation, showing improved training but without reporting specific numerical results.
It is well known that for some tasks, labeled data sets may be hard to gather. Therefore, we wished to tackle here the problem of having insufficient training data. We examined learning methods from unlabeled data after an initial training on a limited labeled data set. The suggested approach can be used as an online learning method on the unlabeled test set. In the general classification task, whenever we predict a label with high enough confidence, we treat it as a true label and train the data accordingly. For the semantic segmentation task, a classic example for an expensive data labeling process, we do so pixel-wise. Our suggested approaches were applied on the MNIST data-set as a proof of concept for a vision classification task and on the ADE20K data-set in order to tackle the semi-supervised semantic segmentation problem.