CVOct 26, 2021

DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples

arXiv:2110.13740v132 citations
Originality Incremental advance
AI Analysis

This addresses the critical obstacle of labeled data scarcity in deep learning for practitioners, offering a robust solution for SSL with minimal supervision, though it appears incremental as it builds on existing data programming methods.

The paper tackles the problem of semi-supervised learning (SSL) performing poorly with very few labeled samples by proposing DP-SSL, which uses data programming to generate probabilistic labels for unlabeled data, achieving 93.46% test accuracy on CIFAR-10 with only 40 labeled samples, surpassing state-of-the-art results.

The scarcity of labeled data is a critical obstacle to deep learning. Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels. However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low quality of learned pseudo labels. In this paper, we propose a new SSL method called DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning~(MCL) based approach to automatically generate LFs from scratch in SSL style. With the noisy labels produced by the LFs, we design a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples. Extensive experiments on four standard SSL benchmarks show that DP-SSL can provide reliable labels for unlabeled data and achieve better classification performance on test sets than existing SSL methods, especially when only a small number of labeled samples are available. Concretely, for CIFAR-10 with only 40 labeled samples, DP-SSL achieves 93.82% annotation accuracy on unlabeled data and 93.46% classification accuracy on test data, which are higher than the SOTA results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes