LGAIMLAug 8, 2018

Active Learning for Regression Using Greedy Sampling

arXiv:1808.04245v1183 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of expensive labeling in regression for applications like EEG-based driver drowsiness estimation, but it is incremental as it builds on existing active learning methods.

The paper tackles the problem of reducing labeling costs in regression by proposing two new active learning approaches based on greedy sampling, which select samples to increase diversity in output or both input and output spaces, and shows their effectiveness through experiments on 27 datasets from various domains.

Regression problems are pervasive in real-world applications. Generally a substantial amount of labeled samples are needed to build a regression model with good generalization ability. However, many times it is relatively easy to collect a large number of unlabeled samples, but time-consuming or expensive to label them. Active learning for regression (ALR) is a methodology to reduce the number of labeled samples, by selecting the most beneficial ones to label, instead of random selection. This paper proposes two new ALR approaches based on greedy sampling (GS). The first approach (GSy) selects new samples to increase the diversity in the output space, and the second (iGS) selects new samples to increase the diversity in both input and output spaces. Extensive experiments on 12 UCI and CMU StatLib datasets from various domains, and on 15 subjects on EEG-based driver drowsiness estimation, verified their effectiveness and robustness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes