CL LG MLMar 4, 2015

Statistical modality tagging from rule-based annotations and crowdsourcing

Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme

arXiv:1503.01190v114.328 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of sparse modality triggers in linguistic tagging, which is incremental in improving data collection methods for a specific NLP task.

The authors tackled the problem of training an automatic modality tagger by gathering training data through a high-recall rule-based tagger and crowdsourcing annotations via Mechanical Turk, resulting in a multi-class SVM tagger that delivers good performance.

We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.

View on arXiv PDF

Similar