LG AIJun 1, 2022

Positive Unlabeled Contrastive Learning

Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon

OpenAI

arXiv:2206.01206v311.116 citationsh-index: 91

Originality Highly original

AI Analysis

This addresses the problem of limited labeled data in binary classification for machine learning practitioners, offering a novel approach that is not incremental but builds on existing paradigms.

The paper tackles learning binary classifiers with only a few labeled positive samples and many unlabeled samples by extending contrastive learning to the positive unlabeled setting, achieving superior performance over state-of-the-art methods on standard benchmarks without requiring class prior knowledge.

Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative). We first propose a simple extension of standard infoNCE family of contrastive losses, to the PU setting; and show that this learns superior representations, as compared to existing unsupervised and supervised approaches. We then develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme; these pseudo-labels can then be used to train the final (positive vs. negative) classifier. Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets, while not requiring a-priori knowledge of any class prior (which is a common assumption in other PU methods). We also provide a simple theoretical analysis that motivates our methods.

View on arXiv PDF

Similar