CY AI CLNov 8, 2025

The Polite Liar: Epistemic Pathology in Language Models

arXiv:2511.07477v11 citations

Originality Incremental advance

AI Analysis

This addresses the problem of epistemic pathology in AI systems for researchers and developers, highlighting a structural issue in alignment methods.

The paper argues that large language models exhibit confident fabrication due to reinforcement learning from human feedback (RLHF), which optimizes for perceived sincerity over evidential accuracy, revealing a tension between linguistic cooperation and epistemic integrity.

Large language models exhibit a peculiar epistemic pathology: they speak as if they know, even when they do not. This paper argues that such confident fabrication, what I call the polite liar, is a structural consequence of reinforcement learning from human feedback (RLHF). Building on Frankfurt's analysis of bullshit as communicative indifference to truth, I show that this pathology is not deception but structural indifference: a reward architecture that optimizes for perceived sincerity over evidential accuracy. Current alignment methods reward models for being helpful, harmless, and polite, but not for being epistemically grounded. As a result, systems learn to maximize user satisfaction rather than truth, performing conversational fluency as a virtue. I analyze this behavior through the lenses of epistemic virtue theory, speech-act philosophy, and cognitive alignment, showing that RLHF produces agents trained to mimic epistemic confidence without access to epistemic justification. The polite liar thus reveals a deeper alignment tension between linguistic cooperation and epistemic integrity. The paper concludes with an "epistemic alignment" principle: reward justified confidence over perceived fluency.

View on arXiv PDF

Similar