MLLGAug 6, 2019

Bayesian Batch Active Learning as Sparse Subset Approximation

arXiv:1908.02144v4150 citations
AI Analysis

This work addresses the problem of scalable active learning for researchers and practitioners dealing with high labeling costs, offering an incremental improvement over existing methods.

The paper tackles the computational inefficiency and negligible model change of greedy active learning methods in large-scale settings by introducing a Bayesian batch active learning approach that approximates the complete data posterior to produce diverse batches, demonstrating benefits on large-scale regression and classification tasks.

Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes