Incivility and Rigidity: Evaluating the Risks of Fine-Tuning LLMs for Political Argumentation
This addresses the challenge of developing AI systems for productive political discourse, but it is incremental as it builds on existing fine-tuning and evaluation methods.
The study tackled the problem of fine-tuning large language models (LLMs) for political argumentation by evaluating risks like incivility and rigidity, finding that Reddit-finetuned models produced safer but rigid arguments while cross-platform fine-tuning increased adversarial tone and toxicity, with prompt-based steering reducing overt toxicity but not fully countering noisy training data.
Incivility on platforms such as Twitter (now X) and Reddit complicates the development of AI systems that can support productive, rhetorically sound political argumentation. We present experiments with \textit{GPT-3.5 Turbo} fine-tuned on two contrasting datasets of political discourse: high-incivility Twitter replies to U.S. Congress and low-incivility posts from Reddit's \textit{r/ChangeMyView}. Our evaluation examines how data composition and prompting strategies affect the rhetorical framing and deliberative quality of model-generated arguments. Results show that Reddit-finetuned models generate safer but rhetorically rigid arguments, while cross-platform fine-tuning amplifies adversarial tone and toxicity. Prompt-based steering reduces overt toxicity (e.g., personal attacks) but cannot fully offset the influence of noisy training data. We introduce a rhetorical evaluation rubric - covering justification, reciprocity, alignment, and authority - and provide implementation guidelines for authoring, moderation, and deliberation-support systems.