CVSep 26, 2024

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

arXiv:2409.17512v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in open-set semi-supervised learning for machine learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of overtrusting in open-set semi-supervised learning, where prior methods overfit due to distribution bias in labeled data, and proposes SCOMatch to treat out-of-distribution samples as an additional class, achieving significant performance improvements over state-of-the-art methods on various benchmarks.

Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL). Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary. These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit. The subsequent self-training process, based on the overfitted result, fails to rectify this problem. In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training. SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results. Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. The effectiveness is further verified through ablation studies and visualization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes