CLNov 24, 2025

Representational and Behavioral Stability of Truth in Large Language Models

arXiv:2511.19166v31 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the reliability of LLMs as information sources by introducing an evaluation framework for epistemic robustness, though it is incremental in extending factuality evaluation.

The study tackled the problem of truth judgment instability in large language models (LLMs) under semantic perturbations, finding that synthetic content induces up to 36.3% belief retractions, while fictional content is more stable.

Large language models (LLMs) are increasingly used as information sources, yet small changes in semantic framing can destabilize their truth judgments. We propose P-StaT (Perturbation Stability of Truth), an evaluation framework for testing belief stability under controlled semantic perturbations in representational and behavioral settings via probing and zero-shot prompting. Across sixteen open-source LLMs and three domains, we compare perturbations involving epistemically familiar Neither statements drawn from well-known fictional contexts (Fictional) to those involving unfamiliar Neither statements not seen in training data (Synthetic). We find a consistent stability hierarchy: Synthetic content aligns closely with factual representations and induces the largest retractions of previously held beliefs, producing up to $32.7\%$ retractions in representational evaluations and up to $36.3\%$ in behavioral evaluations. By contrast, Fictional content is more representationally distinct and comparatively stable. Together, these results suggest that epistemic familiarity is a robust signal across instantiations of belief stability under semantic reframing, complementing accuracy-based factuality evaluation with a notion of epistemic robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes