CL AIFeb 22, 2024

NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes

Ramon Ruiz-Dolz, Joaquin Taverner, John Lawrence, Chris Reed

arXiv:2402.14458v13.414 citationsh-index: 7Data Br

Originality Incremental advance

AI Analysis

This addresses the problem of data scarcity and annotation complexity for researchers and practitioners in argument mining and natural language processing, though it is incremental in automating corpus creation.

The paper tackled the limitations of small, manually annotated corpora in argument mining and generation by introducing an automated methodology to generate natural language arguments, resulting in the largest publicly available multilingual corpus of argumentation schemes and providing baselines and fine-tuned models for scheme identification.

Some of the major limitations identified in the areas of argument mining, argument generation, and natural language argument analysis are related to the complexity of annotating argumentatively rich data, the limited size of these corpora, and the constraints that represent the different languages and domains in which these data is annotated. To address these limitations, in this paper we present the following contributions: (i) an effective methodology for the automatic generation of natural language arguments in different topics and languages, (ii) the largest publicly available corpus of natural language argumentation schemes, and (iii) a set of solid baselines and fine-tuned models for the automatic identification of argumentation schemes.

View on arXiv PDF

Similar