LGAICVAug 27, 2023

Pruning the Unlabeled Data to Improve Semi-Supervised Learning

arXiv:2308.14058v12 citationsh-index: 46
Originality Highly original
AI Analysis

This work addresses performance issues in semi-supervised learning for image classification, offering a practical improvement over conventional methods.

The paper tackles the problem of suboptimal performance in semi-supervised learning by proposing PruneSSL, a technique that prunes unlabeled data to improve separability, resulting in state-of-the-art results across several image classification tasks.

In the domain of semi-supervised learning (SSL), the conventional approach involves training a learner with a limited amount of labeled data alongside a substantial volume of unlabeled data, both drawn from the same underlying distribution. However, for deep learning models, this standard practice may not yield optimal results. In this research, we propose an alternative perspective, suggesting that distributions that are more readily separable could offer superior benefits to the learner as compared to the original distribution. To achieve this, we present PruneSSL, a practical technique for selectively removing examples from the original unlabeled dataset to enhance its separability. We present an empirical study, showing that although PruneSSL reduces the quantity of available training data for the learner, it significantly improves the performance of various competitive SSL algorithms, thereby achieving state-of-the-art results across several image classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes