CLAIOct 28, 2025

Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward

arXiv:2510.24020v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the critical issue of reliable LLM deployment by improving abstention mechanisms, though it is incremental as it builds on existing fine-tuning methods.

The paper tackles the problem of hallucinations in Large Language Models (LLMs) by proposing a reinforcement learning framework with fine-grained semantic confidence rewards, which significantly enhances reliability in in-domain and out-of-distribution benchmarks.

Mitigating hallucinations in Large Language Models (LLMs) is critical for their reliable deployment. Existing methods typically fine-tune LLMs to abstain from answering questions beyond their knowledge scope. However, these methods often rely on coarse-grained signals to guide LLMs to abstain, such as overall confidence or uncertainty scores on multiple sampled answers, which may result in an imprecise awareness of the model's own knowledge boundaries. To this end, we propose a novel reinforcement learning framework built on $\textbf{\underline{Fi}ne-grained \underline{S}emantic \underline{Co}nfidence \underline{Re}ward (\Ours)}$, which guides LLMs to abstain via sample-specific confidence. Specifically, our method operates by sampling multiple candidate answers and conducting semantic clustering, then training the LLM to retain answers within high-confidence clusters and discard those within low-confidence ones, thereby promoting accurate post-hoc abstention. Additionally, we propose a new metric for evaluating the reliability of abstention fine-tuning tasks more comprehensively. Our method significantly enhances reliability in both in-domain and out-of-distribution benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes