CLApr 22, 2019

Fine-Grained Argument Unit Recognition and Classification

Dietrich Trautmann, Johannes Daxenberger, Christian Stab, Hinrich Schütze, Iryna Gurevych

arXiv:1904.09688v43.670 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of improving argument retrieval for humans and machines by shifting from sentence-level to fine-grained annotations, which is an incremental advancement in natural language processing.

The paper tackles the problem of low recall and segmentation errors in argument retrieval by proposing a fine-grained sequence labeling task called Argument Unit Recognition and Classification (AURC), resulting in a new dataset (AURC-8) that contains up to 15% more arguments per topic compared to sentence-level annotations and methods achieving close to human performance.

Prior work has commonly defined argument retrieval from heterogeneous document collections as a sentence-level classification task. Consequently, argument retrieval suffers both from low recall and from sentence segmentation errors making it difficult for humans and machines to consume the arguments. In this work, we argue that the task should be performed on a more fine-grained level of sequence labeling. For this, we define the task as Argument Unit Recognition and Classification (AURC). We present a dataset of arguments from heterogeneous sources annotated as spans of tokens within a sentence, as well as with a corresponding stance. We show that and how such difficult argument annotations can be effectively collected through crowdsourcing with high interannotator agreement. The new benchmark, AURC-8, contains up to 15% more arguments per topic as compared to annotations on the sentence level. We identify a number of methods targeted at AURC sequence labeling, achieving close to human performance on known domains. Further analysis also reveals that, contrary to previous approaches, our methods are more robust against sentence segmentation errors. We publicly release our code and the AURC-8 dataset.

View on arXiv PDF Code

Similar