LGAIDSSep 17, 2012

Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

arXiv:1209.3694v14 citations
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for active learning and survey problems on graph-based datasets, though it is incremental as it builds on existing V-optimality methods.

The paper tackled the lack of worst-case bounds for V-optimality in batch active learning on Gaussian random fields by proving its submodularity, enabling a greedy algorithm with a (1-1/e) approximation guarantee, and it showed that V-optimality outperforms mutual information gain criteria in practice.

Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the predictive covariance Sigma (V-optimality) on GRFs has proven successful in batch active learning classification problems with budget constraints. However, its worst-case bound has been missing. We show that the V-optimality on GRFs as a function of the batch query set is submodular and hence its greedy selection algorithm guarantees an (1-1/e) approximation ratio. Moreover, GRF models have the absence-of-suppressor (AofS) condition. For active survey problems, we propose a similar survey criterion which minimizes 1'(Sigma)1. In practice, V-optimality criterion performs better than GPs with mutual information gain criteria and allows nonuniform costs for different nodes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes