CLDec 16, 2024

On Crowdsourcing Task Design for Discourse Relation Annotation

arXiv:2412.11637v112.221 citationsh-index: 18Has CodeCOLING Workshops

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of designing effective annotation tasks for discourse relations, which is incremental as it compares existing methods to improve data quality for NLP researchers.

The study compared free-choice and forced-choice crowdsourcing methods for annotating implicit discourse relations in English, finding that the free-choice approach produced less diverse annotations despite allowing flexible connective insertion, based on over 130,000 annotations from the DiscoGeM 1.0 corpus.

Interpreting implicit discourse relations involves complex reasoning, requiring the integration of semantic cues with background knowledge, as overt connectives like because or then are absent. These relations often allow multiple interpretations, best represented as distributions. In this study, we compare two established methods that crowdsource English implicit discourse relation annotation by connective insertion: a free-choice approach, which allows annotators to select any suitable connective, and a forced-choice approach, which asks them to select among a set of predefined options. Specifically, we re-annotate the whole DiscoGeM 1.0 corpus -- initially annotated with the free-choice method -- using the forced-choice approach. The free-choice approach allows for flexible and intuitive insertion of various connectives, which are context-dependent. Comparison among over 130,000 annotations, however, shows that the free-choice strategy produces less diverse annotations, often converging on common labels. Analysis of the results reveals the interplay between task design and the annotators' abilities to interpret and produce discourse relations.

View on arXiv PDF Code

Similar