Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints
This work addresses the problem of unreliable LLM outputs for users relying on AI for constrained tasks, showing that reasoning can be counterproductive and is incremental in challenging assumptions about its universal benefits.
The study examined how reasoning in large language models (LLMs) affects output reliability under strict constraints, finding that reasoning models reduce constraint violations (13-26% vs. 66-75% for non-reasoning models) but systematically distort facts and increase fabrications, revealing a trade-off between compliance and factual accuracy.
With the widespread adoption of large language models (LLMs), hallucinations, which are non-factual fabrications in model outputs, have become serious concerns. Reasoning capabilities have received attention as a self-verification process to improve output reliability. However, the effect of reasoning within a closed system where LLMs cannot rely on external tools or knowledge has yet to be clarified. We therefore conduct experiments under strict constraints (recommending peer-reviewed journal articles in computer science) to examine the effect of reasoning across multiple models (GPT-5.2 and Gemini 3 Flash). Our results reveal a problematic trade-off between constraint compliance and factual accuracy. Non-reasoning models exhibit high constraint violation rates (66-75%) but maintain factual accuracy, while reasoning models reduce violations (13-26%) but systematically distort known facts to satisfy constraints and increase complete fabrication. This trade-off pattern is consistent across both models despite different architectures, indicating a fundamental limitation of reasoning. Furthermore, reasoning does not uniformly improve output authenticity: effects diverge by model, reflecting different allocations of the compliance-truthfulness trade-off. These findings challenge the assumption that reasoning universally improves reliability: reasoning models trade honest constraint violations for detection-resistant distortions.