Active Learning from Imperfect Labelers
This addresses the problem of efficient data labeling in machine learning for scenarios with unreliable human or automated annotators, representing an incremental improvement by extending active learning to handle abstention.
The paper tackles active learning with imperfect labelers who can provide incorrect labels or abstain, proposing an algorithm that uses abstention responses to achieve statistical consistency and near-optimal query complexity, adapting query requests based on labeler quality.
We study active learning where the labeler can not only return incorrect labels but also abstain from labeling. We consider different noise and abstention conditions of the labeler. We propose an algorithm which utilizes abstention responses, and analyze its statistical consistency and query complexity under fairly natural assumptions on the noise and abstention rate of the labeler. This algorithm is adaptive in a sense that it can automatically request less queries with a more informed or less noisy labeler. We couple our algorithm with lower bounds to show that under some technical conditions, it achieves nearly optimal query complexity.