CV LGNov 3, 2023

SemiGPC: Distribution-Aware Label Refinement for Imbalanced Semi-Supervised Learning Using Gaussian Processes

Abdelhak Lemkhenter, Manchen Wang, Luca Zancato, Gurumurthy Swaminathan, Paolo Favaro, Davide Modolo

arXiv:2311.01646v11.5h-index: 45

Originality Incremental advance

AI Analysis

This addresses the issue of confirmation bias in semi-supervised learning under class imbalance, which is a domain-specific problem for machine learning practitioners, and is incremental as it builds on existing methods like FixMatch and SimMatch.

The paper tackled the problem of class imbalance in semi-supervised learning by introducing SemiGPC, a distribution-aware label refinement strategy using Gaussian Processes, which achieved state-of-the-art results on benchmarks like CIFAR10-LT/CIFAR100-LT and about 2% average accuracy increase on more challenging datasets.

In this paper we introduce SemiGPC, a distribution-aware label refinement strategy based on Gaussian Processes where the predictions of the model are derived from the labels posterior distribution. Differently from other buffer-based semi-supervised methods such as CoMatch and SimMatch, our SemiGPC includes a normalization term that addresses imbalances in the global data distribution while maintaining local sensitivity. This explicit control allows SemiGPC to be more robust to confirmation bias especially under class imbalance. We show that SemiGPC improves performance when paired with different Semi-Supervised methods such as FixMatch, ReMixMatch, SimMatch and FreeMatch and different pre-training strategies including MSN and Dino. We also show that SemiGPC achieves state of the art results under different degrees of class imbalance on standard CIFAR10-LT/CIFAR100-LT especially in the low data-regime. Using SemiGPC also results in about 2% avg.accuracy increase compared to a new competitive baseline on the more challenging benchmarks SemiAves, SemiCUB, SemiFungi and Semi-iNat.

View on arXiv PDF

Similar