Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty
This work addresses the problem of understanding the alignment of LLM uncertainty with human uncertainty, which is important for researchers and developers aiming to build more human-like and trustworthy AI systems.
This paper investigates how similar large language model (LLM) uncertainty is to human uncertainty, a concept they term "uncertainty alignment." They examine whether LLMs exhibit both human-similar uncertainty signals and good calibration simultaneously across multiple-choice and open-ended factual recall tasks, and characterize the impact of instruct fine-tuning on these aspects.
Uncertainty Quantification is a large and growing subfield of large language model behavioral analysis. Primarily to recognize and combat hallucination, the field has largely focused on measuring and improving calibration, the accuracy of uncertainty judgments to task efficacy. In this work, we investigate the relatively underexplored question of how similar large language model uncertainty is to human uncertainty. We investigate the presence and strength of human-similar uncertainty signals, deemed uncertainty alignment, in large language model overt behavior and internal activation patterns. We identify whether the models show evidence of simultaneous alignment and calibration on a variety of datasets covering both multiple choice and open ended factual recall. And we characterize the effect of instruct fine-tuning on each of these facets.