CLSep 14, 2025

Joint Effects of Argumentation Theory, Audio Modality and Data Enrichment on LLM-Based Fallacy Classification

Hongxu Zhou, Hylke Westerdijk, Khondoker Ittehadul Islam

arXiv:2509.11127v12.7h-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of improving LLM reasoning for fallacy detection in political discourse, but results are incremental as enhancements worsened performance.

The study examined how context and emotional tone metadata affect LLM performance in classifying fallacies in political debates, finding that these additions often reduced accuracy and biased the model toward labeling statements as Appeal to Emotion.

This study investigates how context and emotional tone metadata influence large language model (LLM) reasoning and performance in fallacy classification tasks, particularly within political debate settings. Using data from U.S. presidential debates, we classify six fallacy types through various prompting strategies applied to the Qwen-3 (8B) model. We introduce two theoretically grounded Chain-of-Thought frameworks: Pragma-Dialectics and the Periodic Table of Arguments, and evaluate their effectiveness against a baseline prompt under three input settings: text-only, text with context, and text with both context and audio-based emotional tone metadata. Results suggest that while theoretical prompting can improve interpretability and, in some cases, accuracy, the addition of context and especially emotional tone metadata often leads to lowered performance. Emotional tone metadata biases the model toward labeling statements as \textit{Appeal to Emotion}, worsening logical reasoning. Overall, basic prompts often outperformed enhanced ones, suggesting that attention dilution from added inputs may worsen rather than improve fallacy classification in LLMs.

View on arXiv PDF

Similar