LGDec 4, 2025

Distance Is All You Need: Radial Dispersion for Uncertainty Estimation in Large Language Models

arXiv:2512.04351v27.11 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses the need for reliable uncertainty estimation in LLMs for building trustworthy systems, offering a practical, model-agnostic solution that outperforms existing methods.

The paper tackled the problem of detecting uncertainty in large language models (LLMs) by introducing the Radial Dispersion Score (RDS), a simple, training-free metric that measures semantic variability in embedding space, achieving state-of-the-art hallucination detection and best-of-N performance across four datasets and four LLMs.

Detecting uncertainty in large language models (LLMs) is essential for building reliable systems, yet many existing approaches are overly complex and depend on brittle semantic clustering or access to model internals. We introduce \textbf{Radial Dispersion Score (RDS)}, a simple, training-free, fully model-agnostic uncertainty metric that measures the radial dispersion of sampled generations in embedding space. Specifically, given $N$ sampled generations embedded on the unit hypersphere, RDS computes the total $\ell_1$ distance from the empirical centroid, i.e., the mean embedding, providing a direct geometric signal of semantic variability. A lightweight probability-weighted variant further incorporates the model's own token probabilities when available, outperforming nine recent state-of-the-art baselines. Moreover, RDS naturally extends to effective per-sample uncertainty estimates that complement probability- and consistency-based methods while remaining lightweight for practical use. Across four challenging free-form question-answering datasets and four LLMs, our metrics achieve state-of-the-art hallucination detection and best-of-$N$ performance, while remaining robust and scalable with respect to sample size and embedding choice. These results highlight the practical value of RDS and its contribution toward improving the trustworthiness of LLMs.

View on arXiv PDF

Similar