CLDec 8, 2020

Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

David Q. Sun, Hadas Kotek, Christopher Klein, Mayank Gupta, William Li, Jason D. Williams

arXiv:2012.04169v131.0991 citations

Originality Highly original

AI Analysis

This work addresses the problem of noisy human-labeled data for anyone performing semantic annotation tasks, offering a substantial improvement in accuracy.

This paper introduces Dynamic Automatic Conflict Resolution (DACR), a scalable methodology to estimate and reduce noise in human-labeled data. It improves labeling accuracy by 20-30% compared to other strategies, without requiring a ground truth dataset.

This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the resulting error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.

View on arXiv PDF

Similar