ML LGSep 16, 2022

Double logistic regression approach to biased positive-unlabeled data

Konrad Furmańczyk, Jan Mielniczuk, Wojciech Rejchel, Paweł Teisseyre

arXiv:2209.07787v27.93 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses a limitation in positive-unlabeled learning for applications where the constant propensity assumption is unrealistic, though it is an incremental improvement over existing methods.

The paper tackles the problem of positive-unlabeled learning without the unrealistic constant propensity score assumption by proposing a parametric approach for joint estimation of posterior probability and propensity score functions. Experimental results show the proposed methods are comparable or better than existing Expectation-Maximization based methods.

Positive and unlabelled learning is an important problem which arises naturally in many applications. The significant limitation of almost all existing methods lies in assuming that the propensity score function is constant (SCAR assumption), which is unrealistic in many practical situations. Avoiding this assumption, we consider parametric approach to the problem of joint estimation of posterior probability and propensity score functions. We show that under mild assumptions when both functions have the same parametric form (e.g. logistic with different parameters) the corresponding parameters are identifiable. Motivated by this, we propose two approaches to their estimation: joint maximum likelihood method and the second approach based on alternating maximization of two Fisher consistent expressions. Our experimental results show that the proposed methods are comparable or better than the existing methods based on Expectation-Maximisation scheme.

View on arXiv PDF

Similar