CLMar 18, 2025

Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations

arXiv:2503.14477v238 citationsh-index: 17EMNLP
Originality Incremental advance
AI Analysis

This addresses the issue of misleading users and eroding trust in LLMs by reducing hallucinations, though it is incremental as it builds on existing uncertainty calibration methods.

The paper tackled the problem of LLMs making false claims with assertive language, known as overconfident hallucinations, by identifying that verbal uncertainty is governed by a single linear feature in LLM representations. They showed that calibrating this feature reduces confident hallucinations by an average of ~30% on short-form answers.

LLMs often adopt an assertive language style also when making false claims. Such ``overconfident hallucinations'' mislead users and erode trust. Achieving the ability to express in language the actual degree of uncertainty around a claim is therefore of great importance. We find that ``verbal uncertainty'' is governed by a single linear feature in the representation space of LLMs, and show that this has only moderate correlation with the actual ``semantic uncertainty'' of the model. We apply this insight and show that (1) the mismatch between semantic and verbal uncertainty is a better predictor of hallucinations than semantic uncertainty alone and (2) we can intervene on verbal uncertainty at inference time and reduce confident hallucinations on short-form answers, achieving an average relative reduction of ~30%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes