Improving Open-Set Semi-Supervised Learning with Self-Supervision
This addresses the challenge of handling uncurated data in deployment for machine learning practitioners, but it is incremental as it builds on existing OSSL methods with a novel approach.
The paper tackles the problem of open-set semi-supervised learning, where unlabeled data includes classes not in the labeled set, by proposing a framework that uses self-supervision to learn from all unlabeled data and an energy-based score for accurate recognition of known classes, achieving state-of-the-art results in closed-set accuracy and open-set recognition on benchmark problems.
Open-set semi-supervised learning (OSSL) embodies a practical scenario within semi-supervised learning, wherein the unlabeled training set encompasses classes absent from the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data belonging to unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations that our method yields state-of-the-art results on many of the evaluated benchmark problems in terms of closed-set accuracy and open-set recognition when compared with existing methods for OSSL. Our code is available at https://github.com/walline/ssl-tf2-sefoss.