Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination
This addresses the critical issue of unreliable AI outputs for users who depend on accurate information from LLMs, though it is an incremental improvement using existing fine-tuning methods.
The researchers tackled the problem of hallucination in large language models by creating a targeted supervised fine-tuning dataset that teaches models to recognize their knowledge limits and admit uncertainty, resulting in significant improvements in hallucination metrics (median increases of 0.19% to 25.91% on HypoTerm Score and +0.39% to +0.86% on FactScore) while maintaining stable performance on general knowledge benchmarks.
Large language models (LLMs) often hallucinate, producing fluent but false information, partly because supervised fine-tuning (SFT) implicitly rewards always responding. We introduce $\textit{HypoTermInstruct}$, an SFT dataset (31,487 responses for 11,151 questions) designed to teach models epistemological humility-the ability to recognize the limits of their own knowledge and admit uncertainty. This is achieved through questions about non-existent "hypothetical" terms. We also release $\textit{HypoTermQA-Enhanced}$, a benchmark for hallucination tendency strengthened through multiple validations. We conducted 800 controlled LoRA SFT runs across $\textit{Llama3.1-8B}$ and $\textit{Gemma3-4B}$ (base and instruct), testing 100 fine-tuning configurations with paired controls. Our results demonstrate that replacing generic instruction data with $\textit{HypoTermInstruct}$ significantly improves the HypoTerm Score (median increases of 0.19% to 25.91%) and FactScore (+0.39% to +0.86%), while maintaining stable performance on MMLU (minimal decreases of 0.26% to 0.35%). Our work demonstrates that targeted, high-quality SFT data teaching meta-cognitive skills can effectively reduce hallucination without preference/RL pipelines, providing mechanistic insights and a practical path toward more reliable AI systems.