LGMLMar 17, 2020

Pool-Based Unsupervised Active Learning for Regression Using Iterative Representativeness-Diversity Maximization (iRDM)

arXiv:2003.07658v244 citations
AI Analysis

This addresses the challenge of selecting samples for labeling in regression tasks without prior labels, which is incremental as it builds on existing active learning frameworks.

The paper tackles the problem of unsupervised active learning for regression by proposing iRDM, which balances representativeness and diversity in sample selection without using label information, and it outperforms supervised methods when labeled samples are scarce.

Active learning (AL) selects the most beneficial unlabeled samples to label, and hence a better machine learning model can be trained from the same number of labeled samples. Most existing active learning for regression (ALR) approaches are supervised, which means the sampling process must use some label information, or an existing regression model. This paper considers completely unsupervised ALR, i.e., how to select the samples to label without knowing any true label information. We propose a novel unsupervised ALR approach, iterative representativeness-diversity maximization (iRDM), to optimally balance the representativeness and the diversity of the selected samples. Experiments on 12 datasets from various domains demonstrated its effectiveness. Our iRDM can be applied to both linear regression and kernel regression, and it even significantly outperforms supervised ALR when the number of labeled samples is small.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes