CL AI LGNov 16, 2023

MAFALDA: A Benchmark and Comprehensive Study of Fallacy Detection and Classification

Chadi Helwe, Tom Calamai, Pierre-Henri Paris, Chloé Clavel, Fabian Suchanek

arXiv:2311.09761v210.332 citationsh-index: 7Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of fallacy detection and classification in NLP, providing a unified benchmark and evaluation methods, but it is incremental as it builds on existing datasets and taxonomies.

The authors introduced MAFALDA, a benchmark for fallacy classification that unifies previous datasets and taxonomies, and evaluated language models and humans in zero-shot settings to assess their capability in detecting and classifying fallacies.

We introduce MAFALDA, a benchmark for fallacy classification that merges and unites previous fallacy datasets. It comes with a taxonomy that aligns, refines, and unifies existing classifications of fallacies. We further provide a manual annotation of a part of the dataset together with manual explanations for each annotation. We propose a new annotation scheme tailored for subjective NLP tasks, and a new evaluation method designed to handle subjectivity. We then evaluate several language models under a zero-shot learning setting and human performances on MAFALDA to assess their capability to detect and classify fallacies.

View on arXiv PDF Code

Similar