LGAICLApr 11

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

arXiv:2505.1173746.019 citationsh-index: 10
Predicted impact top 9% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For LLM users and developers, TokUR provides a scalable way to assess and enhance response reliability in multi-step reasoning tasks, though it is incremental over existing uncertainty methods.

TokUR introduces a token-level uncertainty estimation method using low-rank random weight perturbations during LLM decoding, achieving strong correlation with answer correctness and improving reasoning performance on math datasets.

While Large Language Models (LLMs) have demonstrated impressive capabilities, their output quality remains inconsistent across various application scenarios, making it difficult to identify trustworthy responses, especially in complex tasks requiring multi-step reasoning. In this paper, we propose a Token-level Uncertainty estimation framework for Reasoning (TokUR) that enables LLMs to self-assess and self-improve their responses in mathematical reasoning. Specifically, we introduce low-rank random weight perturbation during LLM decoding to generate predictive distributions for token-level uncertainty estimation, and we aggregate these uncertainty quantities to capture the semantic uncertainty of generated responses. Experiments on mathematical reasoning datasets of varying difficulty demonstrate that TokUR exhibits a strong correlation with answer correctness and model robustness, and the uncertainty signals produced by TokUR can be leveraged to enhance the model's reasoning performance at test time. These results highlight the effectiveness of TokUR as a principled and scalable approach for improving the reliability and interpretability of LLMs in challenging reasoning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes