LGJul 10, 2025

Learning from positive and unlabeled examples -Finite size sample bounds

arXiv:2507.07354v13 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses a key limitation in PU learning for applications where prior knowledge is unavailable, though it is incremental as it builds on existing theoretical frameworks.

The paper tackles the problem of Positive Unlabeled (PU) learning without assuming the class prior is known, providing theoretical upper and lower bounds on sample sizes for both positive and unlabeled data.

PU (Positive Unlabeled) learning is a variant of supervised classification learning in which the only labels revealed to the learner are of positively labeled instances. PU learning arises in many real-world applications. Most existing work relies on the simplifying assumptions that the positively labeled training data is drawn from the restriction of the data generating distribution to positively labeled instances and/or that the proportion of positively labeled points (a.k.a. the class prior) is known apriori to the learner. This paper provides a theoretical analysis of the statistical complexity of PU learning under a wider range of setups. Unlike most prior work, our study does not assume that the class prior is known to the learner. We prove upper and lower bounds on the required sample sizes (of both the positively labeled and the unlabeled samples).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes