CLLGApr 30

Geometry-Calibrated Conformal Abstention for Language Models

arXiv:2604.2791496.0
Predicted impact top 9% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying language models, this work offers a principled way to reduce hallucinations by allowing models to abstain when uncertain, with statistical guarantees.

The paper proposes a post hoc framework, Conformal Abstention (CA), which uses conformal prediction to decide when a language model should abstain from answering a query, providing finite-sample guarantees on participation probability and response correctness. The method improves selective answering with 75% conditional correctness.

When language models lack relevant knowledge for a given query, they frequently generate plausible responses that can be hallucinations, rather than admitting being agnostic about the answer. Retraining models to reward admitting ignorance can lead to overly conservative behaviors and poor generalization due to scarce evaluation benchmarks. We propose a post hoc framework, Conformal Abstention (CA), adapted from conformal prediction (CP) to determine whether to abstain from answering a query. CA provides finite-sample guarantees on both the probability of participation (i.e., not abstaining) and the probability that the generated response is correct. Importantly, the abstention decision relies on prediction confidence rather than the non-conformity scores used in CP, which are intractable for open-ended generation. To better align prediction confidence with the model's ignorance, we introduce a calibration strategy using representation geometry within the model to measure knowledge involvement in shaping the response. Experiments demonstrate that we improve selective answering significantly with 75 percent conditional correctness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes