Chance-Constrained Inference for Hallucination Risk Control in Large Language Models
This addresses the issue of hallucination risk for users of large language models in applications requiring reliable outputs, representing a novel approach rather than an incremental improvement.
The paper tackles the problem of controlling the frequency of hallucinations in large language models by formulating inference as a deployment-time risk control problem, introducing chance-constrained inference to bound the probability of hallucinations among accepted generations, and demonstrating reliable risk control in experiments on question-answering datasets.
Large language models generate outputs stochastically and may produce fluent but invalid responses, including factual hallucinations. Existing mitigation strategies reduce average error rates but do not provide explicit control over the \emph{frequency} of such failures under repeated use. We formulate inference as a deployment-time risk control problem and introduce \emph{chance-constrained inference}, which directly bounds the probability of hallucinations among accepted generations. Hallucinations are modeled as stochastic constraint violations, and we show that confidence-based selective prediction does not, in general, imply probabilistic risk guarantees. To enforce chance constraints efficiently, we propose a sequential, anytime-valid inference procedure that adaptively certifies feasibility or infeasibility using finite samples, avoiding conservative fixed-sample bounds. Experiments on questions inspired by NaturalQuestions and controlled multi-hop question answering demonstrate reliable risk control, early detection of intrinsically infeasible inputs, and safe composition under repeated use, while confidence-based baselines fail to provide consistent guarantees.