CL LGNov 2, 2018

A Bayesian Approach for Sequence Tagging with Crowds

arXiv:1811.00780v331.21008 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for cheaper, more reliable labeled data in NLP by improving aggregation methods for crowdsourced annotations, though it is incremental as it builds on existing Bayesian approaches.

The paper tackles the problem of unreliable annotators and span annotation errors in crowdsourced sequence tagging by proposing a Bayesian method that models sequential dependencies and annotator uncertainty, showing it outperforms previous state-of-the-art methods on tasks like named entity recognition and information extraction.

Current methods for sequence tagging, a core task in NLP, are data hungry, which motivates the use of crowdsourcing as a cheap way to obtain labelled data. However, annotators are often unreliable and current aggregation methods cannot capture common types of span annotation errors. To address this, we propose a Bayesian method for aggregating sequence tags that reduces errors by modelling sequential dependencies between the annotations as well as the ground-truth labels. By taking a Bayesian approach, we account for uncertainty in the model due to both annotator errors and the lack of data for modelling annotators who complete few tasks. We evaluate our model on crowdsourced data for named entity recognition, information extraction and argument mining, showing that our sequential model outperforms the previous state of the art. We also find that our approach can reduce crowdsourcing costs through more effective active learning, as it better captures uncertainty in the sequence labels when there are few annotations.

View on arXiv PDF Code

Similar