CLFeb 15, 2024

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

arXiv:2402.09733v160 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the problem of hallucination in LLMs for users relying on accurate AI-generated content, but it is incremental as it builds on existing interpretation techniques.

The paper investigates whether LLMs are aware of hallucination by analyzing differences in hidden states between correct and hallucinated responses, finding that LLMs react differently and showing potential to use these insights to reduce hallucination.

Large Language Models (LLMs) can make up answers that are not real, and this is known as hallucination. This research aims to see if, how, and to what extent LLMs are aware of hallucination. More specifically, we check whether and how an LLM reacts differently in its hidden states when it answers a question right versus when it hallucinates. To do this, we introduce an experimental framework which allows examining LLM's hidden states in different hallucination situations. Building upon this framework, we conduct a series of experiments with language models in the LLaMA family (Touvron et al., 2023). Our empirical findings suggest that LLMs react differently when processing a genuine response versus a fabricated one. We then apply various model interpretation techniques to help understand and explain the findings better. Moreover, informed by the empirical observations, we show great potential of using the guidance derived from LLM's hidden representation space to mitigate hallucination. We believe this work provides insights into how LLMs produce hallucinated answers and how to make them occur less often.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes