AIMay 27

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

Mattia J. Villani, Pranav Deshpande, Akshay Seshadri, Romina Yalovetzky, Niraj Kumar

arXiv:2605.2826471.8Has Code

Predicted impact top 48% in AI · last 90 daysOriginality Highly original

AI Analysis

For practitioners deploying LLMs in high-stakes settings, this provides a lightweight, real-time hallucination detector with theoretical guarantees, closing the gap between cheap and expensive detection methods.

The paper introduces the Calibrated Entropy Score (CES), a single-pass, black-box method for detecting hallucinations in LLMs that uses the full distribution of token-level entropies. CES achieves the highest detection performance among single-pass methods across eight QA benchmarks and ten models, matching multi-sample methods with formal error guarantees.

Large Language Models (LLMs) often generate factually incorrect outputs, commonly termed hallucinations, that undermine trust and limit deployment in high-stakes settings. Existing hallucination detection methods typically require multiple forward passes, or access to model internals. In this work, we provide theoretical background and empirical evidence that the distribution of token-level entropies, beyond the mean captured by perplexity or length-normalised entropy, serves as a fingerprint of hallucination, with distributional shape and tail behaviour carrying independent signal. We formalize hallucination detection as a statistical hypothesis test and propose the Calibrated Entropy Score (CES), a lightweight algorithm requiring only a single forward pass and black-box access to token logits. CES combines the mean signal with the maximum signal of the generated entropy through a calibrated reference CDF, producing scores that are directly comparable across models and tasks. We establish finite-sample calibration guarantees via a novel random-length Dvoretzky--Kiefer--Wolfowitz inequality, and also prove that CES detects hallucinations with probability converging to one exponentially fast in the generation length. Across eight QA benchmarks and ten generator models spanning open-source and API access models, CES achieves the highest detection performance among all single-pass black-box methods while providing formal error guarantees that existing heuristics lack. Remarkably, CES is statistically indistinguishable from multi-sample methods that require far greater computational cost, closing the gap between lightweight and expensive detection and making it suitable for real-time, large-scale deployment.

View on arXiv PDF

Similar