CVJan 19, 2021

An Empirical Study and Analysis on Open-Set Semi-Supervised Learning

Huixiang Luo, Hao Cheng, Fanxu Meng, Yuting Gao, Ke Li, Mengdan Zhang, Xing Sun

arXiv:2101.08237v210.68 citations

Originality Incremental advance

AI Analysis

It addresses performance degradation in semi-supervised learning for realistic settings with out-of-distribution data, offering an incremental improvement by analyzing and enhancing existing methods.

The paper tackles the problem of open-set semi-supervised learning, where traditional methods degrade due to out-of-distribution samples in unlabeled data, and proposes Style Disturbance to achieve state-of-the-art results on various datasets.

Pseudo-labeling (PL) and Data Augmentation-based Consistency Training (DACT) are two approaches widely used in Semi-Supervised Learning (SSL) methods. These methods exhibit great power in many machine learning tasks by utilizing unlabeled data for efficient training. But in a more realistic setting (termed as open-set SSL), where unlabeled dataset contains out-of-distribution (OOD) samples, the traditional SSL methods suffer severe performance degradation. Recent approaches mitigate the negative influence of OOD samples by filtering them out from the unlabeled data. However, it is not clear whether directly removing the OOD samples is the best choice. Furthermore, why PL and DACT could perform differently in open-set SSL remains a mystery. In this paper, we thoroughly analyze various SSL methods (PL and DACT) on open-set SSL and discuss pros and cons of these two approaches separately. Based on our analysis, we propose Style Disturbance to improve traditional SSL methods on open-set SSL and experimentally show our approach can achieve state-of-the-art results on various datasets by utilizing OOD samples properly. We believe our study can bring new insights for SSL research.

View on arXiv PDF

Similar