CLAIMay 23, 2025

Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

arXiv:2505.17558v13 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the problem of hallucination detection in LLMs for applications requiring high accuracy, though it is incremental as it builds on existing DPO methods with a novel training strategy.

The paper tackles the challenge of aligning large language models to detect hallucinations by using carefully engineered hallucinated text as negative examples in a DPO alignment procedure with curriculum learning, resulting in models that improve performance by up to 24% on benchmarks like MedHallu and HaluEval and show robustness in zero-shot settings.

Aligning large language models (LLMs) to accurately detect hallucinations remains a significant challenge due to the sophisticated nature of hallucinated text. Recognizing that hallucinated samples typically exhibit higher deceptive quality than traditional negative samples, we use these carefully engineered hallucinations as negative examples in the DPO alignment procedure. Our method incorporates a curriculum learning strategy, gradually transitioning the training from easier samples, identified based on the greatest reduction in probability scores from independent fact checking models, to progressively harder ones. This structured difficulty scaling ensures stable and incremental learning. Experimental evaluation demonstrates that our HaluCheck models, trained with curriculum DPO approach and high quality negative samples, significantly improves model performance across various metrics, achieving improvements of upto 24% on difficult benchmarks like MedHallu and HaluEval. Additionally, HaluCheck models demonstrate robustness in zero-shot settings, significantly outperforming larger state-of-the-art models across various benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes