LGMLJan 27, 2023

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

arXiv:2301.11856v17 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient data labeling for machine learning practitioners in real-world applications, though it is incremental as it builds on existing active learning methods.

The paper tackled the problem of training accurate classifiers with limited annotation budgets in settings where multiple annotators provide imperfect labels, and the result was that ActiveLab reliably trained more accurate classifiers with far fewer annotations than popular active learning methods.

In real-world data labeling applications, annotators often provide imperfect labels. It is thus common to employ multiple annotators to label data with some overlap between their examples. We study active learning in such settings, aiming to train an accurate classifier by collecting a dataset with the fewest total annotations. Here we propose ActiveLab, a practical method to decide what to label next that works with any classifier model and can be used in pool-based batch active learning with one or multiple annotators. ActiveLab automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones. This is a key aspect of producing high quality labels and trained models within a limited annotation budget. In experiments on image and tabular data, ActiveLab reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes