Automatic Argument Quality Assessment -- New Datasets and Methods
This work addresses the need for better tools in natural language processing to evaluate argument quality, though it is incremental as it builds on existing methods with new data.
The authors tackled the problem of automatic argument quality assessment by collecting and annotating new datasets, including 6.3k arguments and 14k argument pairs, and developed neural methods based on a language model that achieve comparable or superior performance to state-of-the-art in ranking and classification tasks.
We explore the task of automatic assessment of argument quality. To that end, we actively collected 6.3k arguments, more than a factor of five compared to previously examined data. Each argument was explicitly and carefully annotated for its quality. In addition, 14k pairs of arguments were annotated independently, identifying the higher quality argument in each pair. In spite of the inherent subjective nature of the task, both annotation schemes led to surprisingly consistent results. We release the labeled datasets to the community. Furthermore, we suggest neural methods based on a recently released language model, for argument ranking as well as for argument-pair classification. In the former task, our results are comparable to state-of-the-art; in the latter task our results significantly outperform earlier methods.