CLAILGMar 3, 2024

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

arXiv:2403.01548v352 citationsh-index: 9ICML
Originality Incremental advance
AI Analysis

This addresses hallucinations in LLMs, which is a critical issue for reliability in applications like knowledge-seeking, though it is an incremental improvement over existing methods.

The study tackled the problem of hallucinations in large language models by analyzing inner representations, finding that correct generations have sharper context activations, and proposed a constrained decoding method that improved TruthfulQA by up to 8.6 points.

Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the ``sharpness'' among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes