CLFeb 19, 2025

Fine-grained Fallacy Detection with Human Label Variation

arXiv:2502.13853v112 citationsh-index: 11NAACL
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling natural disagreement in annotations for fallacy detection, which is incremental by incorporating human label variation into dataset creation and evaluation.

The authors tackled the problem of fallacy detection in social media by introducing Faina, a dataset with over 11K span-level annotations across 20 fallacy types, which accounts for human label variation and multiple plausible answers. They developed an evaluation framework that moves beyond single ground truth and showed that transformer-based approaches serve as strong baselines across four detection setups.

We introduce Faina, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. Faina includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond "single ground truth" evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong baselines across all settings. We release our data, code, and annotation guidelines to foster research on fallacy detection and human label variation more broadly.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes