LGMar 14, 2022

Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data

arXiv:2203.07206v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses a practical issue for data scientists in scenarios with limited labeled data, offering incremental improvements by adapting existing methods to handle unreliable unlabeled data.

The paper tackles the problem of binary classification with only one labeled class by comparing One-Class (OC) and Positive Unlabeled (PU) learning, finding that PU algorithms are not always superior when unlabeled data is unreliable, and proposes robust modifications to OC algorithms with guidelines for practical use.

When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that when unlabeled data is available, the go-to algorithms should always come from the PU group. However, we find that this is not always the case if unlabeled data is unreliable, i.e. contains limited or biased latent negative data. We perform an extensive experimental study of a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Furthermore, we propose PU modifications of state-of-the-art OC algorithms that are robust to unreliable unlabeled data, as well as a guideline to similarly modify other OC algorithms. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise. Additionally, we outline procedures to distinguish the cases of reliable and unreliable unlabeled data using statistical tests.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes