CL AIMay 16

Double-Calibration: Towards Reliable LLMs via Calibrating Knowledge and Reasoning Confidence

Yuyin Lu, Ziran Liang, Yanghui Rao, Wenqi Fan, Fu Lee Wang, Qing Li

arXiv:2601.1195694.2h-index: 13

Predicted impact top 15% in CL · last 90 daysOriginality Highly original

AI Analysis

For LLM users needing reliable reasoning, DoublyCal addresses hallucination by providing well-calibrated confidence scores traceable to evidence uncertainty.

DoublyCal improves accuracy and confidence calibration of black-box LLMs on knowledge-intensive benchmarks by calibrating both knowledge evidence and reasoning confidence, achieving significant gains with low token cost.

Reliable reasoning in Large Language Models (LLMs) is challenged by their propensity for hallucination. While augmenting LLMs with Knowledge Graphs (KGs) improves factual accuracy, existing KG-augmented methods fail to quantify epistemic uncertainty in both the retrieved evidence and LLMs' reasoning. To bridge this gap, we introduce DoublyCal, a framework built on a novel double-calibration principle. DoublyCal employs a lightweight proxy model to first generate KG evidence alongside a calibrated evidence confidence. This calibrated supporting evidence then guides a black-box LLM, yielding final predictions that are not only more accurate but also well-calibrated, with confidence scores traceable to the uncertainty of the supporting evidence. Experiments on knowledge-intensive benchmarks show that DoublyCal significantly improves both the accuracy and confidence calibration of black-box LLMs while maintaining low token cost.

View on arXiv PDF

Similar