Confidence-Guided Data Augmentation for Improved Semi-Supervised Training
This work addresses the problem of limited labeled data for image classification, offering an incremental improvement in semi-supervised learning methods.
The paper tackles the problem of improving image classification accuracy and robustness by using a confidence-guided data augmentation strategy to generate synthetic images from challenging samples, which when used in semi-supervised training leads to improvements over fully supervised baselines on datasets like STL10 and CIFAR-100.
We propose a new strategy to improve the accuracy and robustness of image classification. First, we train a baseline CNN model. Then, we identify challenging regions in the feature space by identifying all misclassified samples, and correctly classified samples with low confidence values. These samples are then used to train a Variational AutoEncoder (VAE). Next, the VAE is used to generate synthetic images. Finally, the generated synthetic images are used in conjunction with the original labeled images to train a new model in a semi-supervised fashion. Empirical results on benchmark datasets such as STL10 and CIFAR-100 show that the synthetically generated samples can further diversify the training data, leading to improvement in image classification in comparison with the fully supervised baseline approaches using only the available data.