CL LGOct 20, 2025

Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations

Tong Chen, Akari Asai, Luke Zettlemoyer, Hannaneh Hajishirzi, Faeze Brahman

arXiv:2510.17733v16.73 citationsh-index: 28

Originality Incremental advance

AI Analysis

This addresses the tradeoff between reducing hallucinations and maintaining model performance for practical applications in language generation and question answering.

The paper tackles the problem of language models generating factually incorrect information (extrinsic hallucinations) by proposing a binary retrieval-augmented reward method, which reduces hallucination rates by 39.3% in open-ended generation and decreases incorrect answers by 44.4% and 21.7% on specific question-answering tasks without degrading performance on other tasks.

Language models often generate factually incorrect information unsupported by their training data, a phenomenon known as extrinsic hallucination. Existing mitigation approaches often degrade performance on open-ended generation and downstream tasks, limiting their practical utility. We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR) to address this tradeoff. Unlike continuous reward schemes, our approach assigns a reward of one only when the model's output is entirely factually correct, and zero otherwise. We evaluate our method on Qwen3 reasoning models across diverse tasks. For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates, substantially outperforming both supervised training and continuous-reward RL baselines. In short-form question answering, the model learns calibrated abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge. This yields 44.4% and 21.7% fewer incorrect answers on PopQA and GPQA, respectively. Crucially, these factuality gains come without performance degradation on instruction following, math, or code, whereas continuous-reward RL, despite improving factuality, induces quality regressions.

View on arXiv PDF

Similar