Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

arXiv:2601.05905v11 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

This addresses the need for reliable LLM deployment by improving belief robustness, though it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of LLMs maintaining truthful beliefs under contextual perturbations, showing that even facts with perfect self-consistency can collapse under mild interference, and proposes Structure-Aware Training to reduce long-tail knowledge brittleness by approximately 30%.

As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes