LGMay 3, 2024

Soft Label PU Learning

arXiv:2405.01990v1
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem in machine learning for classification tasks with incomplete labels, such as anti-cheat services in gaming, and is incremental as it builds on existing PU learning methods by incorporating soft labels.

The paper tackles the problem of PU learning, where only some positive samples are labeled, by proposing soft label PU learning that assigns probabilities to unlabeled samples and designing new evaluation metrics to optimize performance. Experiments on public and real datasets from Tencent games show the method's effectiveness, with concrete improvements in classification tasks.

PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels according to their probabilities of being positive. Considering that the ground truth of TPR, FPR, and AUC are unknown, we then design PU counterparts of these metrics to evaluate the performances of soft label PU learning methods within validation data. We show that these new designed PU metrics are good substitutes for the real metrics. After that, a method that optimizes such metrics is proposed. Experiments on public datasets and real datasets for anti-cheat services from Tencent games demonstrate the effectiveness of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes