STMLJan 17, 2022

Risk bounds for PU learning under Selected At Random assumption

arXiv:2201.06277v1
Originality Synthesis-oriented
AI Analysis

This work provides theoretical guarantees for PU learning, which is important for researchers and practitioners dealing with incomplete labeled data in binary classification tasks, but it is incremental as it builds on existing methodologies.

The paper tackles the problem of establishing risk bounds for positive-unlabeled learning under the Selected At Random assumption, where labeling probability depends on covariates, and quantifies the impact of label noise compared to standard classification, proving that the upper bound is almost optimal.

Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled. The challenge is then to find the correct classifier despite this lack of information. Recently, new methodologies have been introduced to address the case where the probability of being labeled may depend on the covariates. In this paper, we are interested in establishing risk bounds for PU learning under this general assumption. In addition, we quantify the impact of label noise on PU learning compared to standard classification setting. Finally, we provide a lower bound on minimax risk proving that the upper bound is almost optimal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes