CLMay 28, 2025

LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High

arXiv:2505.22354v210 citationsh-index: 4CogSci
Originality Incremental advance
AI Analysis

This addresses the problem of misinformation reinforcement in LLMs for users and developers, though it is incremental as it builds on existing linguistic analysis methods.

The paper investigates how large language models (LLMs) handle false presuppositions, which subtly embed misinformation, and finds that models like GPT-4-o, LLama-3-8B, and Mistral-7B-v03 struggle to recognize and reject these false presuppositions, with performance varying based on linguistic and political factors.

This paper examines how LLMs handle false presuppositions and whether certain linguistic factors influence their responses to falsely presupposed content. Presuppositions subtly introduce information as given, making them highly effective at embedding disputable or false information. This raises concerns about whether LLMs, like humans, may fail to detect and correct misleading assumptions introduced as false presuppositions, even when the stakes of misinformation are high. Using a systematic approach based on linguistic presupposition analysis, we investigate the conditions under which LLMs are more or less sensitive to adopt or reject false presuppositions. Focusing on political contexts, we examine how factors like linguistic construction, political party, and scenario probability impact the recognition of false presuppositions. We conduct experiments with a newly created dataset and examine three LLMs: OpenAI's GPT-4-o, Meta's LLama-3-8B, and MistralAI's Mistral-7B-v03. Our results show that the models struggle to recognize false presuppositions, with performance varying by condition. This study highlights that linguistic presupposition analysis is a valuable tool for uncovering the reinforcement of political misinformation in LLM responses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes