CVLGJun 29, 2022

On Non-Random Missing Labels in Semi-Supervised Learning

arXiv:2206.14923v122 citationsh-index: 75Has Code
Originality Incremental advance
AI Analysis

This addresses the realistic challenge of label bias in semi-supervised learning for practitioners, offering an incremental improvement over prior methods by explicitly incorporating class information.

The paper tackles the problem of non-random missing labels in semi-supervised learning, where labels are missing not at random (MNAR) due to class biases, and proposes a class-aware method that significantly outperforms existing baselines and other bias removal methods under various MNAR settings.

Semi-Supervised Learning (SSL) is fundamentally a missing label problem, in which the label Missing Not At Random (MNAR) problem is more realistic and challenging, compared to the widely-adopted yet naive Missing Completely At Random assumption where both labeled and unlabeled data share the same class distribution. Different from existing SSL solutions that overlook the role of "class" in causing the non-randomness, e.g., users are more likely to label popular classes, we explicitly incorporate "class" into SSL. Our method is three-fold: 1) We propose Class-Aware Propensity (CAP) that exploits the unlabeled data to train an improved classifier using the biased labeled data. 2) To encourage rare class training, whose model is low-recall but high-precision that discards too many pseudo-labeled data, we propose Class-Aware Imputation (CAI) that dynamically decreases (or increases) the pseudo-label assignment threshold for rare (or frequent) classes. 3) Overall, we integrate CAP and CAI into a Class-Aware Doubly Robust (CADR) estimator for training an unbiased SSL model. Under various MNAR settings and ablations, our method not only significantly outperforms existing baselines but also surpasses other label bias removal SSL methods. Please check our code at: https://github.com/JoyHuYY1412/CADR-FixMatch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes