Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
This work addresses uncertainty estimation for detecting hallucinations in LLMs, which is an incremental advancement in domain-specific methods.
The paper tackles the problem of uncertainty quantification in autoregressive large language models by learning conditional dependencies from attention-based features, achieving substantial improvements in selective generation across ten datasets and three LLMs.
Uncertainty quantification (UQ) has emerged as a promising approach for detecting hallucinations and low-quality output of Large Language Models (LLMs). However, obtaining proper uncertainty scores is complicated by the conditional dependency between the generation steps of an autoregressive LLM because it is hard to model it explicitly. Here, we propose to learn this dependency from attention-based features. In particular, we train a regression model that leverages LLM attention maps, probabilities on the current generation step, and recurrently computed uncertainty scores from previously generated tokens. To incorporate the recurrent features, we also suggest a two-staged training procedure. Our experimental evaluation on ten datasets and three LLMs shows that the proposed method is highly effective for selective generation, achieving substantial improvements over rivaling unsupervised and supervised approaches.