CLOct 29, 2024

Distinguishing Ignorance from Error in LLM Hallucinations

Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

arXiv:2410.22071v29.116 citationsh-index: 55Has Code

Originality Incremental advance

AI Analysis

This work addresses hallucinations in LLMs for users relying on accurate outputs, but it is incremental as it builds on existing detection and mitigation efforts.

The paper tackles the problem of LLM hallucinations by distinguishing between two types: HK- (lack of knowledge) and HK+ (incorrect despite knowledge), finding that HK+ is prevalent and that distinguishing them helps mitigation, with different models hallucinating on different examples.

Large language models (LLMs) are susceptible to hallucinations -- factually incorrect outputs -- leading to a large body of work on detecting and mitigating such cases. We argue that it is important to distinguish between two types of hallucinations: ones where the model does not hold the correct answer in its parameters, which we term HK-, and ones where the model answers incorrectly despite having the required knowledge, termed HK+. We first find that HK+ hallucinations are prevalent and occur across models and datasets. Then, we demonstrate that distinguishing between these two cases is beneficial for mitigating hallucinations. Importantly, we show that different models hallucinate on different examples, which motivates constructing model-specific hallucination datasets for training detectors. Overall, our findings draw attention to classifying types of hallucinations and provide means to handle them more effectively. The code is available at https://github.com/technion-cs-nlp/hallucination-mitigation .

View on arXiv PDF Code

Similar