LGOCJun 5, 2021

Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach

arXiv:2106.02968v444 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient sample selection in active learning for deep learning applications, especially under tight labeling budgets, though it is incremental as it builds on existing optimization and feature methods.

The paper tackles the problem of selecting a small core subset of unlabeled data for labeling in active learning, particularly in low-budget scenarios, by introducing an integer optimization approach that minimizes Wasserstein distance and shows competitive performance, outperforming baselines when less than 1% of the data is labeled.

Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. The large scale of data sets used in deep learning forces most sample selection strategies to employ efficient heuristics. This paper introduces an integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. We demonstrate that this problem can be tractably solved with a Generalized Benders Decomposition algorithm. Our strategy uses high-quality latent features that can be obtained by unsupervised learning on the unlabeled pool. Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes