LGNov 25, 2025

DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning

arXiv:2511.20225v21 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of limited labeled data for multi-label learning tasks, offering an incremental improvement over existing pseudo-labeling methods.

The paper tackles the problem of noisy pseudo-labels degrading performance in semi-supervised multi-label learning by proposing DiCaP, a correctness-aware framework that calibrates pseudo-label weights based on estimated correctness likelihood, resulting in improvements of up to 4.27% over state-of-the-art methods on benchmark datasets.

Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model's performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes