LiDAM: Semi-Supervised Learning with Localized Domain Adaptation and Iterative Matching
This work addresses the challenge of reducing labeling costs in machine learning for practitioners, though it appears incremental as it builds on existing semi-supervised techniques.
The paper tackles the problem of expensive data labeling by proposing LiDAM, a semi-supervised learning method that combines domain adaptation and self-paced learning to improve model training with limited labeled data, achieving state-of-the-art performance on CIFAR-100 with 73.50% accuracy using 2500 labels, outperforming FixMatch at 71.82%.
Although data is abundant, data labeling is expensive. Semi-supervised learning methods combine a few labeled samples with a large corpus of unlabeled data to effectively train models. This paper introduces our proposed method LiDAM, a semi-supervised learning approach rooted in both domain adaptation and self-paced learning. LiDAM first performs localized domain shifts to extract better domain-invariant features for the model that results in more accurate clusters and pseudo-labels. These pseudo-labels are then aligned with real class labels in a self-paced fashion using a novel iterative matching technique that is based on majority consistency over high-confidence predictions. Simultaneously, a final classifier is trained to predict ground-truth labels until convergence. LiDAM achieves state-of-the-art performance on the CIFAR-100 dataset, outperforming FixMatch (73.50% vs. 71.82%) when using 2500 labels.