LGMay 22

LLMs Show No Signs Of Individuated Metacognition

arXiv:2605.2429937.9

Predicted impact top 65% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners relying on confidence-weighted methods, this paper shows that LLM confidence is not individually informative, undermining assumptions behind selective abstention and ensemble weighting.

The study decomposes binary confidence judgments from 20 LLMs across six benchmarks, finding that cross-model confidence is largely rank-one and driven by item difficulty, with no evidence for significant individuated metacognition beyond shared difficulty factors.

Confidence-weighted routing, selective abstention, and ensemble weighting all assume that a model's stated confidence is informative about its capability on the question being asked. They presume functional metacognition, the capacity to assess one's own capabilities, without exercising them. Aggregate calibration is well studied, with mixed results, but the underlying structure of elicited confidence is less well understood. We decompose binary confidence judgements from 20 frontier Large Language Models (LLMs) across six benchmarks using tetrachoric factor analysis paired with pairwise calibration, asking whether two models that differ in confidence also differ in performance. On factual recall and information retrieval benchmarks the cross-model confidence matrix is approximately rank-one and a single dominant factor captures most of the latent variance. Models retrieving facts share an item-level difficulty axis and differ mainly in their decision thresholds along it. Across all benchmarks the relationship between confidence and performance collapses once items that all models agree on are removed. Inter-model pairwise calibration is small even where statistically significant, and what remains shrinks to nothing once base-rate differences along the shared factor are controlled for. Mathematical reasoning is the apparent exception, but this turns out to be a confound where reasoning models answer questions about their confidence by trying to solve them in their chain of thought, bypassing the sub-symbolic self-knowledge we seek to measure. We find no evidence for significant verbalised individuated metacognition in any tested domain.

View on arXiv PDF

Similar