CLJan 27, 2024

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

arXiv:2401.15449v157 citationsh-index: 4KNOWLEDGENLP
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable factual generation in LLMs for applications requiring high accuracy, though it is incremental as it builds on existing methods like reinforcement learning.

The paper tackled the problem of factual hallucinations in Large Language Models (LLMs) by developing a training framework that enhances their ability to utilize internal knowledge, resulting in improved performance on knowledge-based tasks with over 85% accuracy in knowledge probing.

We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes