CL AIAug 30, 2023

Quantifying Uncertainty in Answers from any Language Model and Enhancing their Trustworthiness

arXiv:2308.16175v222.6147 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the issue of trustworthiness in LLM outputs for users relying on black-box APIs, offering a practical tool for uncertainty quantification without model retraining, though it is incremental as it builds on existing uncertainty estimation techniques.

The paper tackles the problem of unreliable outputs from large language models by introducing BSDetector, a method that estimates confidence scores for any LLM response via black-box API access, enabling users to identify incorrect answers and improve accuracy by selecting high-confidence responses, with experiments showing it outperforms alternative uncertainty estimation methods on benchmarks like GPT-3 and ChatGPT.

We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, whose training data remains unknown. By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that cautions when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDetector more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps. In applications involving automated evaluation with LLMs, accounting for our confidence scores leads to more reliable evaluation in both human-in-the-loop and fully-automated settings (across both GPT 3.5 and 4).

View on arXiv PDF

Similar