CL SYFeb 6, 2025

Enhancing Hallucination Detection through Noise Injection

Litian Liu, Reza Pourreza, Sunny Panchal, Apratim Bhattacharyya, Yao Qin, Roland Memisevic

arXiv:2502.03799v216.312 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses the critical issue of hallucination detection for the safe deployment of LLMs, offering an incremental improvement over existing uncertainty-based methods.

The paper tackles the problem of detecting hallucinations in Large Language Models by proposing a noise injection method that perturbs model parameters during sampling, which significantly improves detection performance across various datasets and architectures.

Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. Effectively detecting hallucinations is therefore crucial for the safe deployment of LLMs. Recent research has linked hallucinations to model uncertainty, suggesting that hallucinations can be detected by measuring dispersion over answer distributions obtained from a set of samples drawn from a model. While drawing from the distribution over tokens defined by the model is a natural way to obtain samples, in this work, we argue that it is sub-optimal for the purpose of detecting hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. To this end, we propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling. We demonstrate its effectiveness across a wide range of datasets and model architectures.

View on arXiv PDF

Similar