Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data
This work addresses semi-supervised classification for machine learning practitioners by offering a method that relaxes common assumptions, though it appears incremental as it builds on existing PU classification techniques.
The paper tackles the problem of semi-supervised classification by extending positive and unlabeled (PU) classification to include negative data, proposing a novel approach that avoids distributional assumptions like the cluster assumption. It establishes generalization error bounds that decrease with more unlabeled data and demonstrates usefulness through experiments.
Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised classification approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised classification methods. Through experiments, we demonstrate the usefulness of the proposed methods.