Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning
This addresses the challenge of discovering novel categories in semi-supervised learning, which is important for real-world applications where data is partially labeled, but it appears incremental as it builds on existing methods with specific improvements.
The paper tackles the problem of learning novel categories from unlabeled data while maintaining performance on seen categories in open-world semi-supervised learning, achieving a 3% average accuracy increase on ImageNet by balancing the learning pace between seen and novel classes.
In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. Moreover, capturing the semantics of unlabeled novel category samples is also challenging due to the missing label information. To address the above issues, we introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together in the output space. Extensive experiments on benchmark datasets demonstrate that previous approaches may significantly hinder novel class learning, whereas our method strikingly balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset. Importantly, we find that fine-tuning the self-supervised pre-trained model significantly boosts the performance, which is overlooked in prior literature. Our code is available at https://github.com/yebo0216best/LPS-main.