MLLGJul 14, 2024

Augmented prediction of a true class for Positive Unlabeled data under selection bias

arXiv:2407.10309v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses a practical but incremental scenario in machine learning for researchers dealing with biased PU data, highlighting risks of misapplication.

The paper tackles the problem of predicting true classes in Positive Unlabeled (PU) data under selection bias, where labeled observations are available at prediction time, and shows that a variational autoencoder-based method performs on par or better than other variants, improving accuracy for unlabeled samples.

We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for PU scenario works on par or better than other considered variants and yields advantage over feature-only based methods in terms of accuracy for unlabeled samples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes