Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
This addresses reliability issues for deploying LLMs in real-world scenarios, though it appears incremental as it builds on existing retrieval and calibration methods.
The paper tackles the problem of improving LLM reliability for open-domain question answering by proposing Deliberative Searcher, a framework that integrates certainty calibration with retrieval-based search using reinforcement learning with constraints. The result shows improved alignment between model confidence and correctness, leading to more trustworthy outputs.
Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.