ML LGFeb 28, 2025

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk, Wojciech Rejchel, Paweł Teisseyre

arXiv:2502.21194v24.51 citationsh-index: 20

Originality Incremental advance

AI Analysis

This work addresses a specific challenge in PU learning for machine learning practitioners, offering an incremental improvement over prior methods.

The paper tackles the problem of estimating class prior shift in positive unlabeled (PU) learning scenarios, where source and target distributions differ, by introducing a direct estimator based on kernel embeddings and distribution matching, achieving competitive or better performance than existing methods with proven asymptotic consistency and non-asymptotic bounds.

We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of a class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in a Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works consistently on par or better than its competitors.

View on arXiv PDF

Similar