CLAIMar 27

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

arXiv:2604.1976818.8h-index: 5
Predicted impact top 70% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This addresses the issue of detecting AI-generated content with unreliable epistemic grounding, providing a tool for screening and detection, though it is incremental as it builds on existing theoretical frameworks.

The study tackled the problem of large language models (LLMs) exhibiting systematic miscalibration where rhetorical intensity does not match epistemic grounding, by proposing a framework to quantify this decoupling using metrics like form-meaning divergence and rhetorical device distribution entropy. The results showed LLM-generated texts had significantly elevated form-meaning divergence (Δ=0.68, p<0.001) and produced tricolon at nearly twice the expert rate (Δ=0.95).

Large language models (LLMs) exhibit systematic miscalibration with rhetorical intensity not proportionate to epistemic grounding. This study tests this hypothesis and proposes a framework for quantifying this decoupling by designing a triadic epistemic-rhetorical marker (ERM) taxonomy. The taxonomy is operationalized through composite metrics of form-meaning divergence (FMD), genuine-to-performed epistemic ratio (GPR), and rhetorical device distribution entropy (RDDE). Applied to 225 argumentative texts spanning approximately 0.6 Million tokens across human expert, human non-expert, and LLM-generated sub-corpora, the framework identifies a consistent, model-agnostic LLM epistemic signature. LLM-generated texts produce tricolon at nearly twice the expert rate ($Δ= 0.95$), while human authors produce erotema at more than twice the LLM rate. Performed hesitancy markers appear at twice the human density in LLM output. FMD is significantly elevated in LLM texts relative to both human groups ($p < 0.001, Δ= 0.68$), and rhetorical devices are distributed significantly more uniformly across LLM documents. The findings are consistent with theoretical intuitions derived from Gricean pragmatics, Relevance Theory, and Brandomian inferentialism. The annotation pipeline is fully automatable, making it deployable as a lightweight screening tool for epistemic miscalibration in AI-generated content and as a theoretically motivated feature set for LLM-generated text detection pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes