LGCLMay 27

Self-Consistency via Marginal Sharpening

arXiv:2605.2814282.8
AI Analysis

For practitioners using LLMs for reasoning tasks, this provides a faster and more effective inference-time method without additional training.

The paper proposes a new inference-time objective for language models that sharpens the answer marginal rather than the full-output distribution, enabling efficient parallel sampling. This method outperforms standard power sampling on math and coding benchmarks while being orders of magnitude faster.

Inference-time sampling can elicit strong reasoning abilities from language models without additional training. Existing power-sampling methods do so by sharpening the distribution over full generated outputs, favoring completions that are individually likely under the model. We argue that this is the wrong object to target for reasoning: a completion entangles a reasoning trace with a final answer, whereas what matters is whether an answer is supported by many plausible reasoning paths. We therefore shift the target from the full-output distribution to the sharpened answer marginal, making self-consistency an inference-time objective rather than a post-hoc voting criterion. Surprisingly, this marginal target admits an efficient approximation: we propose a simple, purely autoregressive parallel sampling algorithm that approximately samples from the sharpened answer marginal, eliciting stronger performance than standard power sampling on mathematics and coding benchmarks while being orders of magnitude faster.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes