ML LGOct 15, 2017

Information-Theoretic Representation Learning for Positive-Unlabeled Classification

Tomoya Sakai, Gang Niu, Masashi Sugiyama

arXiv:1710.05359v41.0

Originality Highly original

AI Analysis

This work addresses a key bottleneck in weakly supervised classification for domains with high-dimensional data, offering a novel preprocessing method that enhances PU classification performance.

The paper tackles the problem of positive-unlabeled (PU) classification by proposing a representation learning method that avoids the need for accurate class-prior estimation, a critical bottleneck in existing methods. The result is a state-of-the-art performance in PU classification, with significant improvements in class-prior estimation accuracy when combined with deep neural networks.

Recent advances in weakly supervised classification allow us to train a classifier only from positive and unlabeled (PU) data. However, existing PU classification methods typically require an accurate estimate of the class-prior probability, which is a critical bottleneck particularly for high-dimensional data. This problem has been commonly addressed by applying principal component analysis in advance, but such unsupervised dimension reduction can collapse underlying class structure. In this paper, we propose a novel representation learning method from PU data based on the information-maximization principle. Our method does not require class-prior estimation and thus can be used as a preprocessing method for PU classification. Through experiments, we demonstrate that our method combined with deep neural networks highly improves the accuracy of PU class-prior estimation, leading to state-of-the-art PU classification performance.

View on arXiv PDF

Similar