LGMLAug 27, 2018

Learning from Positive and Unlabeled Data under the Selected At Random Assumption

arXiv:1808.08755v117 citations
Originality Incremental advance
AI Analysis

This addresses a common challenge in tasks like medical diagnosis and web page classification where only partial labels are available, offering a more flexible solution, though it is incremental in improving upon prior assumptions.

The paper tackles the problem of learning from positive and unlabeled data by proposing a weaker assumption than existing ones, where positive examples are selected at random conditioned on attributes, and introduces an EM method that outperforms state-of-the-art approaches under stronger assumptions.

For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes