Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals
This addresses the need for efficient and robust confidence estimation in high-stakes LLM applications, offering a practical solution for resource-constrained settings.
The paper tackles the problem of brittle confidence estimation for large language models under distribution shift and domain-specialized text by proposing Structural Confidence, a single-pass framework that uses multi-scale structural signals from hidden-state trajectories, achieving strong performance in AUROC and AUPR across four benchmarks like FEVER and TruthfulQA.
Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, semantic similarity and multi-sample consistency, remain brittle under distribution shift, domain-specialised text, and compute limits. In this work, we present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction based on multi-scale structural signals derived from a model's final-layer hidden-state trajectory. By combining spectral, local-variation, and global shape descriptors, our method captures internal stability patterns that are missed by probabilities and sentence embeddings. We conduct extensive, cross-domain evaluation across four heterogeneous benchmarks-FEVER (fact verification), SciFact (scientific claims), WikiBio-hallucination (biographical consistency), and TruthfulQA (truthfulness-oriented QA). Our Structural Confidence framework demonstrates strong performance compared with established baselines in terms of AUROC and AUPR. More importantly, unlike sampling-based consistency methods which require multiple stochastic generations and an auxiliary model, our approach uses a single deterministic forward pass, offering a practical basis for efficient, robust post-hoc confidence estimation in socially impactful, resource-constrained LLM applications.