LG HCJul 25, 2023

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Daniel Kałuża, Andrzej Janusz, Dominik Ślęzak

arXiv:2307.14380v13.84 citationsh-index: 37

Originality Highly original

AI Analysis

This addresses the challenge of balancing label quality and quantity in active learning for real-world applications where expert annotations are costly or unreliable.

The paper tackles the problem of faulty data annotations in active learning by proposing two novel annotation unification algorithms that work with sparse and noisy labels, achieving superior performance in estimating annotator reliability and assigning actual labels compared to state-of-the-art methods and majority voting on four public datasets.

Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice. To tackle this challenge, active learning algorithms are commonly employed to select only the most relevant data for labeling. However, this is possible only when the quality and quantity of labels acquired from experts are sufficient. Unfortunately, in many applications, a trade-off between annotating individual samples by multiple annotators to increase label quality vs. annotating new samples to increase the total number of labeled instances is necessary. In this paper, we address the issue of faulty data annotations in the context of active learning. In particular, we propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space. The proposed methods require little to no intersection between samples annotated by different experts. Our experiments on four public datasets indicate the robustness and superiority of the proposed methods in both, the estimation of the annotator's reliability, and the assignment of actual labels, against the state-of-the-art algorithms and the simple majority voting.

View on arXiv PDF

Similar