LGMLJun 3, 2019

A Variational Approach for Learning from Positive and Unlabeled Data

arXiv:1906.00642v610 citations
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and stable methods in PU learning for domains where negative samples are hard to obtain, though it appears incremental as it builds on existing variational approaches.

The paper tackles the problem of learning binary classifiers from only positive and unlabeled data, a common challenge in applications like web text classification and fraud detection, by introducing a variational principle that directly evaluates modeling error without approximating negative distributions or class priors, leading to an efficient loss function and improved performance on benchmarks.

Learning binary classifiers only from positive and unlabeled (PU) data is an important and challenging task in many real-world applications, including web text classification, disease gene identification and fraud detection, where negative samples are difficult to verify experimentally. Most recent PU learning methods are developed based on the conventional misclassification risk of the supervised learning type, and they require to solve the intractable risk estimation problem by approximating the negative data distribution or the class prior. In this paper, we introduce a variational principle for PU learning that allows us to quantitatively evaluate the modeling error of the Bayesian classifier directly from given data. This leads to a loss function which can be efficiently calculated without any intermediate step or model, and a variational learning method can then be employed to optimize the classifier under general conditions. In addition, the discriminative performance and numerical stability of the variational PU learning method can be further improved by incorporating a margin maximizing loss function. We illustrate the effectiveness of the proposed variational method on a number of benchmark examples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes