CLMay 23, 2022

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

arXiv:2205.11472v323 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the challenge of limited and expert-dependent datasets for Argument Mining, offering a method to reduce data needs while maintaining performance, though it is incremental as it builds on existing fine-tuning approaches.

The study investigated how dataset composition affects few- and zero-shot performance in Argument Mining, finding that reducing training sample size by up to 90% while using carefully composed samples can still achieve 95% of maximum performance across three tasks and datasets.

The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of Argument Mining dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes