AUC Optimization from Multiple Unlabeled Datasets
This addresses a challenging weakly supervised learning scenario for researchers, but appears incremental as it adapts existing AUC methods to a specific unlabeled data setting.
The paper tackles the problem of AUC optimization from multiple unlabeled datasets with limited class prior knowledge, proposing U^m-AUC, which converts the data into a multi-label AUC optimization problem and shows effectiveness theoretically and empirically.
Weakly supervised learning aims to empower machine learning when the perfect supervision is unavailable, which has drawn great attention from researchers. Among various types of weak supervision, one of the most challenging cases is to learn from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U$^m$ learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimization model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U$^m$-AUC, an AUC optimization approach that converts the U$^m$ data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U$^m$-AUC is effective theoretically and empirically.