LGAIOct 4, 2023

SemiReward: A General Reward Model for Semi-supervised Learning

arXiv:2310.03013v225 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of confirmation bias in semi-supervised learning for researchers and practitioners, offering a pluggable solution that improves over existing methods, though it is incremental as it builds on the self-training framework.

The paper tackles the challenge of distinguishing high-quality pseudo labels in semi-supervised learning by proposing SemiReward, a reward model that filters pseudo labels, achieving significant performance gains and faster convergence on 13 benchmarks across classification and regression tasks.

Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks across three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch. Code and models are available at https://github.com/Westlake-AI/SemiReward.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes