UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
This addresses the reliability of LLM outputs for users in applications like machine translation, summarization, and question-answering, though it is an incremental improvement over existing UQ techniques.
The paper tackled the problem of length bias in uncertainty quantification (UQ) methods for large language models, which rely on token probabilities, by proposing UNCERTAINTY-LINE, a debiasing procedure that regresses uncertainty scores on output length to produce corrected, length-invariant estimates, and demonstrated consistent improvements over existing methods across multiple tasks and metrics.
Large Language Models (LLMs) have become indispensable tools across various applications, making it more important than ever to ensure the quality and the trustworthiness of their outputs. This has led to growing interest in uncertainty quantification (UQ) methods for assessing the reliability of LLM outputs. Many existing UQ techniques rely on token probabilities, which inadvertently introduces a bias with respect to the length of the output. While some methods attempt to account for this, we demonstrate that such biases persist even in length-normalized approaches. To address the problem, here we propose UNCERTAINTY-LINE: (Length-INvariant Estimation), a simple debiasing procedure that regresses uncertainty scores on output length and uses the residuals as corrected, length-invariant estimates. Our method is post-hoc, model-agnostic, and applicable to a range of UQ measures. Through extensive evaluation on machine translation, summarization, and question-answering tasks, we demonstrate that UNCERTAINTY-LINE: consistently improves over even nominally length-normalized UQ methods uncertainty estimates across multiple metrics and models.