LGDec 24, 2013

Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data

Fengqi Li, Chuang Yu, Nanhai Yang, Feng Xia, Guangming Li, Fatemeh Kaveh-Yazdy

arXiv:1312.6807v110 citations

Originality Incremental advance

AI Analysis

This addresses the imbalance problem in semi-supervised learning for practitioners using graph-based methods, though it is incremental as it builds on existing oversampling techniques.

The paper tackles the problem of semi-supervised learning on imbalanced labeled datasets, where class boundaries are skewed by majority classes, by proposing an iterative oversampling method that selects unlabeled samples to balance the dataset, resulting in improved performance over state-of-the-art methods on UCI and MNIST datasets.

Transductive graph-based semi-supervised learning methods usually build an undirected graph utilizing both labeled and unlabeled samples as vertices. Those methods propagate label information of labeled samples to neighbors through their edges in order to get the predicted labels of unlabeled samples. Most popular semi-supervised learning approaches are sensitive to initial label distribution happened in imbalanced labeled datasets. The class boundary will be severely skewed by the majority classes in an imbalanced classification. In this paper, we proposed a simple and effective approach to alleviate the unfavorable influence of imbalance problem by iteratively selecting a few unlabeled samples and adding them into the minority classes to form a balanced labeled dataset for the learning methods afterwards. The experiments on UCI datasets and MNIST handwritten digits dataset showed that the proposed approach outperforms other existing state-of-art methods.

View on arXiv PDF

Similar