Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design
This addresses the critical need for reliable uncertainty quantification in LLMs, particularly for high-stakes applications, though it is incremental as it builds on existing PAC-Bayes and baseline methods.
The paper tackles the problem of unreliable uncertainty quantification in large language models by introducing information-lift certificates with formal abstention guarantees, achieving 77.0% coverage at 2% risk and blocking 96% of critical errors in high-stakes scenarios.
Large language models often produce confident but incorrect outputs, creating a critical need for reliable uncertainty quantification with formal abstention guarantees. We introduce information-lift certificates that compare model probabilities to a skeleton baseline, accumulating evidence through sub-gamma PAC-Bayes bounds that remain valid under heavy-tailed distributions where standard concentration inequalities fail. On eight diverse datasets, our method achieves 77.0\% coverage at 2\% risk, outperforming recent baselines by 10.0 percentage points on average. In high-stakes scenarios, we block 96\% of critical errors compared to 18-31\% for entropy-based methods. While our frequency-based certification does not guarantee severity-weighted safety and depends on skeleton quality, performance degrades gracefully under distributional shifts, making the approach practical for real-world deployment.