LGDBFeb 8, 2024

ActiveDP: Bridging Active Learning and Data Programming

arXiv:2402.06056v13 citationsh-index: 5EDBT
Originality Incremental advance
AI Analysis

This addresses the challenge of high-cost manual labeling for machine learning practitioners, though it appears incremental as it builds on existing paradigms.

The paper tackles the problem of labeling large datasets efficiently and accurately by combining active learning and data programming, resulting in a framework that outperforms previous methods and performs well under various labeling budgets.

Modern machine learning models require large labelled datasets to achieve good performance, but manually labelling large datasets is expensive and time-consuming. The data programming paradigm enables users to label large datasets efficiently but produces noisy labels, which deteriorates the downstream model's performance. The active learning paradigm, on the other hand, can acquire accurate labels but only for a small fraction of instances. In this paper, we propose ActiveDP, an interactive framework bridging active learning and data programming together to generate labels with both high accuracy and coverage, combining the strengths of both paradigms. Experiments show that ActiveDP outperforms previous weak supervision and active learning approaches and consistently performs well under different labelling budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes