AIMay 31, 2025

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

Chenjun Xu, Bingbing Wen, Bin Han, Robert Wolfe, Lucy Lu Wang, Bill Howe

AI2UW

arXiv:2506.00582v212 citationsh-index: 30ACL

Originality Incremental advance

AI Analysis

This addresses overconfidence in LLMs for interpretability and fairness, but is incremental as it builds on existing calibration methods.

The study examined how LLMs estimate their confidence on QA tasks, finding they differ from human patterns and exhibit stereotypical biases when prompted with personas, and proposed Answer-Free Confidence Estimation (AFCE) to reduce overconfidence and improve calibration, showing significant reductions on datasets like MMLU and GPQA.

Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.

View on arXiv PDF

Similar