Is Self-Consistency superseded?

Self-Consistency (LLM reasoning / chain-of-thought): heavily superseded — a standard baseline that newer methods routinely beat. 24 paper(s) critique it, 9 beat it on benchmarks — #2 of 772 most-superseded. Sub-problem: cluster led by Chain-of-Thought. Newer alternatives in the same sub-problem include Marginal Sharpening, Tree-of-Thoughts, Co-ReAct, MA-CoT, Novelty-based Tree-of-Thought Search.

Method Drift›LLM reasoning / chain-of-thought

Heavily superseded#2 of 772 most-superseded

Self-Consistency

Self-Consistency Improves Chain of Thought Reasoning in Language Models

LLM reasoning / chain-of-thought · first seen Mar 21, 2022

heavily superseded — a standard baseline that newer methods routinely beat

24 papers critique it · 9 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Self-Consistency as a baseline.

“Although self-consistency can enhance output stability, it also often leads to hallucinations or invalid queries due to semantic errors since sampling at higher temperatures increases randomness instead of improving diversity.”
— LLM-Based SQL Generation: Prompting, Self-Refinement, and Adaptive Weighted Majority Voting
“It improves accuracy without retraining but suffers from high inference cost and lacks intra-chain correction. The mistakes made early in a chain propagate to the end since errors are never revised mid-way.”
— Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs
“self-consistency must be inferred multiple times, burdening deployment budgets”
— Nash CoT: Multi-Path Inference with Preference Equilibrium
“However, these prevailing strategies are inherently bounded by the quality of the set of candidates. They are limited to produce a solution that transcends the quality of candidate proposals, which becomes particularly problematic when all candidates are flawed.”
— Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs
“outperforming prior refinement methods~wang2023selfconsistency”
— Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue
“However, these structured reasoning methods universally require generating longer sequences or processing multiple reasoning paths, leading directly to a substantial increase in inference cost”
— Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning
“However, as self-consistency is fundamentally an extension of CoT, it is unclear whether self-consistency also improves performance on non-math questions that involve the recall of encyclopedic knowledge”
— Does Self-Consistency Improve the Recall of Encyclopedic Knowledge?
“Early SC approaches focused on majority voting for final answers”
— Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
“While effective, it uses a fixed number of calls and can fail when the correct answer is infrequent.”
— CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
“While effective, SC's primary drawback is its high computational cost.”
— Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
“However, despite its effectiveness, SC incurs significant computational costs at inference time due to its requirement for multiple sampling iterations.”
— Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
“these studies predominantly focus on short tasks (e.g., math problems, brief QA), where stochastic sampling produces independent errors. This assumption breaks down in long settings where systemic errors emerge due to position bias.”
— Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems

Beaten on benchmarks

Head-to-head results where a newer method reports beating Self-Consistency. Values are copied from the source paper's tables — verify against the cited paper.

Nash CoT (10 Paths) beats Self-Consistency · Avg. [Mistral-Instruct (7B)]
71.1 vs 70.8
Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT beats Self-Consistency · Avg. [Mistral-Instruct (7B)]
42.0 vs 40.6
Nash CoT: Multi-Path Inference with Preference Equilibrium
Nash CoT beats Self-Consistency · Avg. [GLM4 (large)]
95.5 vs 94.7
Nash CoT: Multi-Path Inference with Preference Equilibrium
CGES-LNS (arith) beats Self-Consistency · Acc (%) [AIME24]
77.7 vs 77.3
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [AIME24]
7.77 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · Acc (%) [MATH500]
81.3 vs 81.2
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [MATH500]
5.99 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [GSM8K]
2.41 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [MMLU_Pro]
8.24 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [GPQA]
11.90 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-LNS (arith) beats Self-Consistency · #Calls [Avg]
7.26 vs 16.00
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency
CGES-DeepConf (B10) beats Self-Consistency · Acc (%) [AIME24]
77.7 vs 77.3
CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.