Method Drift›LLM reasoning / chain-of-thought
Self-Consistency
Self-Consistency Improves Chain of Thought Reasoning in Language ModelsLLM reasoning / chain-of-thought · first seen Mar 21, 2022
heavily superseded — a standard baseline that newer methods routinely beat
24 papers critique it · 9 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Self-Consistency as a baseline.
“Although self-consistency can enhance output stability, it also often leads to hallucinations or invalid queries due to semantic errors since sampling at higher temperatures increases randomness instead of improving diversity.”
— LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting“It improves accuracy without retraining but suffers from high inference cost and lacks intra-chain correction. The mistakes made early in a chain propagate to the end since errors are never revised mid-way.”
— Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs“self-consistency must be inferred multiple times, burdening deployment budgets”
— Nash CoT: Multi-Path Inference with Preference Equilibrium“However, these prevailing strategies are inherently bounded by the quality of the set of candidates. They are limited to produce a solution that transcends the quality of candidate proposals, which becomes particularly problematic when all candidates are flawed.”
— Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs“outperforming prior refinement methods~wang2023selfconsistency”
— Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue“However, these structured reasoning methods universally require generating longer sequences or processing multiple reasoning paths, leading directly to a substantial increase in inference cost”
— Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning“However, as self-consistency is fundamentally an extension of CoT, it is unclear whether self-consistency also improves performance on non-math questions that involve the recall of encyclopedic knowledge”
— Does Self-Consistency Improve the Recall of Encyclopedic Knowledge?“Early SC approaches focused on majority voting for final answers”
— Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection“While effective, it uses a fixed number of calls and can fail when the correct answer is infrequent.”
— CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency“While effective, SC's primary drawback is its high computational cost.”
— Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling“However, despite its effectiveness, SC incurs significant computational costs at inference time due to its requirement for multiple sampling iterations.”
— Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling“these studies predominantly focus on short tasks (e.g., math problems, brief QA), where stochastic sampling produces independent errors. This assumption breaks down in long settings where systemic errors emerge due to position bias.”
— Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems
Beaten on benchmarks
Head-to-head results where a newer method reports beating Self-Consistency. Values are copied from the source paper's tables — verify against the cited paper.
- Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT (10 Paths) beats Self-Consistency · Avg. [Mistral-Instruct (7B)]
71.1 vs 70.8
- Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT beats Self-Consistency · Avg. [Mistral-Instruct (7B)]
42.0 vs 40.6
- Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT beats Self-Consistency · Avg. [GLM4 (large)]
95.5 vs 94.7
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · Acc (%) [AIME24]
77.7 vs 77.3
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [AIME24]
7.77 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · Acc (%) [MATH500]
81.3 vs 81.2
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [MATH500]
5.99 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [GSM8K]
2.41 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [MMLU_Pro]
8.24 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [GPQA]
11.90 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [Avg]
7.26 vs 16.00
- CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-DeepConf (B10) beats Self-Consistency · Acc (%) [AIME24]
77.7 vs 77.3
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 27, 2026
- Tree-of-ThoughtsTree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design PatternsMay 27, 2026
- May 22, 2026
- May 22, 2026
- Novelty-based Tree-of-Thought SearchNovelty-based Tree-of-Thought Search for LLM Reasoning and PlanningMay 7, 2026
- Decoding-Time Debiasing via Process Reward ModelsDecoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended GenerationMay 4, 2026
- Apr 27, 2026
- Apr 22, 2026
- CoT-PoT ensemblingSelf-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM ReasoningApr 19, 2026
- AtroposAtropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model HotswapApr 16, 2026
- Apr 1, 2026
- Learning When to SampleLearning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought ReasoningMar 17, 2026