LGAICVAug 3, 2024

Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples

arXiv:2408.01872v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a practical issue in semi-supervised learning for image classification, offering an incremental improvement over previous safe methods by better leveraging unlabeled data.

The paper tackles the problem of class distribution mismatch in semi-supervised learning, where out-of-distribution data degrades performance, by proposing a method that uses self-supervised contrastive learning and a novel loss function to aggregate in-distribution examples as positives, resulting in improved classification accuracy across datasets like CIFAR-10 and CIFAR-100 under various mismatch ratios.

Semi-supervised learning methods have shown promising results in solving many practical problems when only a few labels are available. The existing methods assume that the class distributions of labeled and unlabeled data are equal; however, their performances are significantly degraded in class distribution mismatch scenarios where out-of-distribution (OOD) data exist in the unlabeled data. Previous safe semi-supervised learning studies have addressed this problem by making OOD data less likely to affect training based on labeled data. However, even if the studies effectively filter out the unnecessary OOD data, they can lose the basic information that all data share regardless of class. To this end, we propose to apply a self-supervised contrastive learning approach to fully exploit a large amount of unlabeled data. We also propose a contrastive loss function with coefficient schedule to aggregate as an anchor the labeled negative examples of the same class into positive examples. To evaluate the performance of the proposed method, we conduct experiments on image classification datasets - CIFAR-10, CIFAR-100, Tiny ImageNet, and CIFAR-100+Tiny ImageNet - under various mismatch ratios. The results show that self-supervised contrastive learning significantly improves classification accuracy. Moreover, aggregating the in-distribution examples produces better representation and consequently further improves classification accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes