LGSep 29, 2025

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

arXiv:2509.24228v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses a methodological gap for researchers in weakly supervised learning by providing a standardized evaluation framework, though it is incremental as it builds on existing PU learning methods.

The paper tackles the inconsistent and unrealistic evaluation of positive-unlabeled (PU) learning algorithms by proposing the first benchmark, identifying critical factors like unrealistic validation sets and bias towards one-sample settings, and introducing a calibration approach to ensure fair comparisons.

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, the problem settings and solutions of PU learning have different families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes