Can LLMs Take Retrieved Information with a Grain of Salt?
For high-stakes domains like medicine and finance, this work addresses the underexplored problem of LLMs adapting to uncertainty in retrieved information, offering a practical strategy to improve reliability.
The paper evaluates eight LLMs on their ability to adjust responses based on the certainty of retrieved information, finding systematic limitations such as overtrusting complex contexts. The proposed interaction strategy reduces obedience errors by 25% on average without modifying model weights.
Large language models have demonstrated impressive retrieval-augmented capabilities. However, a crucial area remains underexplored: their ability to appropriately adapt responses to the certainty of the retrieved information. It is a limitation with real consequences in high-stakes domains like medicine and finance. We evaluate eight LLMs on their context-certainty obedience, measuring how well they adjust responses to match expressed context certainty. Our analysis reveals systematic limitations: LLMs struggle to recall prior knowledge after observing an uncertain context, misinterpret expressed certainties, and overtrust complex contexts. To address these, we propose an interaction strategy combining prior reminders, certainty recalibration, and context simplification. This approach reduces obedience errors by 25% on average, without modifying model weights, demonstrating the efficacy of interaction design in enhancing LLM reliability. Our contributions include a principled evaluation metric, empirical insights into LLMs' uncertainty handling, and a portable strategy to improve context-certainty obedience across diverse LLMs.