LGApr 4, 2025

Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

arXiv:2504.03579v23 citationsh-index: 17Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient hallucination detection for LLM users, offering a method that is incremental but provides practical improvements in sample efficiency.

The paper tackles the problem of detecting hallucinations in large language models by proposing a Bayesian algorithm for estimating semantic entropy, which reduces the required samples by 47% compared to prior work while maintaining detection quality as measured by AUROC.

Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 53% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes