CLAIFeb 17, 2025

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

ETH Zurich
arXiv:2502.11962v33 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses a key reliability issue for users of LLMs in applications requiring accurate and informative responses, though it is incremental in refining existing fine-tuning methods.

The paper tackles the trade-off between truthfulness and informativeness in instruction fine-tuning of large language models by introducing two new paradigms, UNIT_cut and UNIT_ref, which improve truthfulness and reduce hallucinations while maintaining informativeness.

Instruction fine-tuning (IFT) can increase the informativeness of large language models (LLMs), but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, $UNIT_{cut}$ and $UNIT_{ref}$, to address this issue. $UNIT_{cut}$ identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas $UNIT_{ref}$ trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that $UNIT_{cut}$ substantially improves LLM truthfulness, while $UNIT_{ref}$ maintains high informativeness and reduces hallucinations by distinguishing between confident and uncertain statements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes