CLMar 22

Entropy Alone is Insufficient for Safe Selective Prediction in LLMs

Edward Phillips, Fredrik K. Gustafsson, Sean Wu, Anshul Thakur, David A. Clifton

arXiv:2603.2117251.62 citationsh-index: 7

AI Analysis

This work addresses the problem of mitigating hallucinations in language models for safe deployment, but it is incremental as it builds on existing uncertainty quantification techniques.

The paper tackled the problem of unreliable abstention behavior in selective prediction systems for LLMs by identifying a failure mode of entropy-based uncertainty methods and addressing it with a combined score of entropy and correctness probe. The result showed that the combined score improved risk-coverage trade-off and calibration performance across three QA benchmarks and four model families, with concrete numbers indicating general improvements over entropy-only baselines.

Selective prediction systems can mitigate harms resulting from language model hallucinations by abstaining from answering in high-risk cases. Uncertainty quantification techniques are often employed to identify such cases, but are rarely evaluated in the context of the wider selective prediction policy and its ability to operate at low target error rates. We identify a model-dependent failure mode of entropy-based uncertainty methods that leads to unreliable abstention behaviour, and address it by combining entropy scores with a correctness probe signal. We find that across three QA benchmarks (TriviaQA, BioASQ, MedicalQA) and four model families, the combined score generally improves both the risk--coverage trade-off and calibration performance relative to entropy-only baselines. Our results highlight the importance of deployment-facing evaluation of uncertainty methods, using metrics that directly reflect whether a system can be trusted to operate at a stated risk level.

View on arXiv PDF

Similar