CLJun 3

Boosting Self-Consistency with Ranking

arXiv:2606.0505454.4
AI Analysis

Improves answer selection in LLM self-consistency for practitioners seeking better accuracy under limited test-time compute.

Ranking-Improved Self-Consistency (RISC) reformulates answer selection in self-consistency as a ranking problem, using a lightweight LambdaRank model with five features. It achieves better accuracy-efficiency trade-offs than standard self-consistency across three datasets, with large gains on QA benchmarks.

Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We address this limitation with Ranking-Improved Self-Consistency (RISC), which reformulates answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, RISC uses a lightweight LambdaRank model to score candidate answers with five carefully designed features that capture answer frequency, semantic centrality, and reasoning-trace consistency. We evaluate RISC on three datasets under a range of test-time budgets. Across datasets, RISC consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes