CLAILGNov 16, 2023

MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification

arXiv:2311.09761v232 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of fallacy detection and classification in NLP, providing a unified benchmark and evaluation methods, but it is incremental as it builds on existing datasets and taxonomies.

The authors introduced MAFALDA, a benchmark for fallacy classification that unifies previous datasets and taxonomies, and evaluated language models and humans in zero-shot settings to assess their capability in detecting and classifying fallacies.

We introduce MAFALDA, a benchmark for fallacy classification that merges and unites previous fallacy datasets. It comes with a taxonomy that aligns, refines, and unifies existing classifications of fallacies. We further provide a manual annotation of a part of the dataset together with manual explanations for each annotation. We propose a new annotation scheme tailored for subjective NLP tasks, and a new evaluation method designed to handle subjectivity. We then evaluate several language models under a zero-shot learning setting and human performances on MAFALDA to assess their capability to detect and classify fallacies.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes