CVLGNov 19, 2019

Rethinking deep active learning: Using unlabeled data at model training

arXiv:1911.08177v188 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of reducing labeling costs in machine learning, particularly for image classification, by proposing an incremental enhancement to active learning pipelines.

The paper tackles the problem of active learning by incorporating unlabeled data during model training, not just for acquisition, and finds that this approach yields a surprising accuracy improvement in image classification, even with very small label budgets like one label per class.

Active learning typically focuses on training a model on few labeled examples alone, while unlabeled ones are only used for acquisition. In this work we depart from this setting by using both labeled and unlabeled data during model training across active learning cycles. We do so by using unsupervised feature learning at the beginning of the active learning pipeline and semi-supervised learning at every active learning cycle, on all available data. The former has not been investigated before in active learning, while the study of latter in the context of deep learning is scarce and recent findings are not conclusive with respect to its benefit. Our idea is orthogonal to acquisition strategies by using more data, much like ensemble methods use more models. By systematically evaluating on a number of popular acquisition strategies and datasets, we find that the use of unlabeled data during model training brings a surprising accuracy improvement in image classification, compared to the differences between acquisition strategies. We thus explore smaller label budgets, even one label per class.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes