AI CL SPOct 20, 2024

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

Shahrad Mohammadzadeh, Juan David Guerra, Marco Bonizzato, Reihaneh Rabbany, Golnoosh Farnadi

arXiv:2410.15460v58.54 citationsh-index: 7ACL

Originality Incremental advance

AI Analysis

This addresses reliability issues for users of LLMs in domains like Wikipedia, Medical, Legal, and Coding, though it is incremental as it builds on existing training methods.

The paper tackles the problem of hallucinations in large language models by proposing Sensitivity Dropout (SenD), a training protocol that reduces hallucination variance, resulting in up to 17% improved test-time reliability and enhanced factual accuracy across multiple domains without harming downstream performance.

As large language models (LLMs) become increasingly prevalent, concerns about their reliability, particularly due to hallucinations - factually inaccurate or irrelevant outputs - have grown. Our research investigates the relationship between the uncertainty in training dynamics and the emergence of hallucinations. Using models from the Pythia suite and several hallucination detection metrics, we analyze hallucination trends and identify significant variance during training. To address this, we propose Sensitivity Dropout (SenD), a novel training protocol designed to reduce hallucination variance during training by deterministically dropping embedding indices with significant variability. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This metric is integrated into our training protocol, allowing SenD to be both computationally scalable and effective at reducing hallucination variance. SenD improves test-time reliability of Pythia and Meta's Llama models by up to 17% and enhances factual accuracy in Wikipedia, Medical, Legal, and Coding domains without affecting downstream task performance.

View on arXiv PDF

Similar