NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes
This addresses the problem of data scarcity and annotation complexity for researchers and practitioners in argument mining and natural language processing, though it is incremental in automating corpus creation.
The paper tackled the limitations of small, manually annotated corpora in argument mining and generation by introducing an automated methodology to generate natural language arguments, resulting in the largest publicly available multilingual corpus of argumentation schemes and providing baselines and fine-tuned models for scheme identification.
Some of the major limitations identified in the areas of argument mining, argument generation, and natural language argument analysis are related to the complexity of annotating argumentatively rich data, the limited size of these corpora, and the constraints that represent the different languages and domains in which these data is annotated. To address these limitations, in this paper we present the following contributions: (i) an effective methodology for the automatic generation of natural language arguments in different topics and languages, (ii) the largest publicly available corpus of natural language argumentation schemes, and (iii) a set of solid baselines and fine-tuned models for the automatic identification of argumentation schemes.