CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency

arXiv:2605.0587358.0h-index: 4
Predicted impact top 18% in ML · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners using LLM self-consistency, this provides a principled method to stop sampling with guaranteed error control, addressing a key practical bottleneck.

The paper tackles the problem of controlling error rates when deciding when to stop sampling in LLM self-consistency. It proposes the CITE algorithm that provably controls false certification under arbitrary data-driven stopping, with matching minimax lower bounds and improved certification in diffuse-tail settings.

Large language models often improve reasoning by sampling multiple outputs and aggregating their final answers, but precise and efficient control of error levels remains a challenging task. In particular, deciding when to stop sampling remains difficult when the stopping rule is data-dependent and the set of possible answers is not known in advance. We study anytime-valid certification of a prespecified target answer as the unique mode of the model's response distribution, a guarantee distinct from answer correctness. We propose the Certification by Intersection-union Testing with E-processes (CITE) algorithm, which provably controls false certification at any prescribed level under arbitrary data-driven stopping, without requiring prior knowledge of the answer category set. We also prove an category-set-size-free stopping-time rate, establish matching minimax lower bounds up to constants in the main regime, and extend the construction to confidence-weighted voting. Simulations and LLM self-consistency experiments show empirical error control and improved certification in diffuse-tail settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes