LGMLAug 11, 2020

S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification

arXiv:2008.04662v127 citations
AI Analysis

It addresses the challenge of distinguishing in-class from out-of-class data in machine learning, which is crucial for real-world applications, but the approach appears incremental as it builds on existing semi-supervised and transductive methods.

The paper tackles the embedding confusion problem in open set classification by proposing S2OSC, a semi-supervised approach that filters out-of-class instances and retrains models, achieving state-of-the-art performance with 85.4% F1 on CIFAR-10 using only 300 pseudo-labels.

Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with in-class data, then utilize the pre-trained models to classify test data directly. However, these methods always suffer from embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, that incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data and remaining unlabeled test data in semi-supervised paradigm, which also integrates pre-trained model for knowledge distillation to further separate mixed instances. Despite its simplicity, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes