LGAIMay 19, 2022

A Boosting Algorithm for Positive-Unlabeled Learning

arXiv:2205.09485v45 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses a practical classification challenge in scenarios with incomplete labels, such as cyber security, but is incremental as it adapts existing boosting methods to a specific data setting.

The paper tackled the problem of binary classification with only positive and unlabeled data by proposing AdaPU, a novel boosting algorithm, which outperformed neural networks on benchmark datasets, including a large-scale cyber security dataset.

Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. Many recent PU methods are based on neural networks, but little has been done to develop boosting algorithms for PU learning, despite boosting algorithms' strong performance on many fully supervised classification problems. In this paper, we propose a novel boosting algorithm, AdaPU, for PU learning. Similarly to AdaBoost, AdaPU aims to optimize an empirical exponential loss, but the loss is based on the PU data, rather than on positive-negative (PN) data. As in AdaBoost, we learn a weighted combination of weak classifiers by learning one weak classifier and its weight at a time. However, AdaPU requires a very different algorithm for learning the weak classifiers and determining their weights. This is because AdaPU learns a weak classifier and its weight using a weighted positive-negative (PN) dataset with some negative data weights $-$ the dataset is derived from the original PU data, and the data weights are determined by the current weighted classifier combination, but some data weights are negative. Our experiments showed that AdaPU outperforms neural networks on several benchmark PU datasets, including a large-scale challenging cyber security dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes