Is RefineRL superseded?

RefineRL (LLM reasoning / chain-of-thought): current frontier — recent, not yet superseded in the knowledge base. 0 paper(s) critique it, 0 beat it on benchmarks — not ranked as a superseded baseline. Sub-problem: cluster led by Chain-of-Thought. Newer alternatives in the same sub-problem include Marginal Sharpening, Tree-of-Thoughts, Co-ReAct, MA-CoT, Novelty-based Tree-of-Thought Search.

Method Drift›LLM reasoning / chain-of-thought

Tracked

RefineRL

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

LLM reasoning / chain-of-thought · first seen Apr 1, 2026

current frontier — recent, not yet superseded in the knowledge base

0 papers critique it · 0 beat it on benchmarks

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.

Marginal Sharpening Self-Consistency via Marginal Sharpening
May 27, 2026
Tree-of-Thoughts Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns
May 27, 2026
Co-ReAct Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
May 22, 2026
MA-CoT Enhancing Reliability in LLM-Based Secure Code Generation
May 22, 2026
Novelty-based Tree-of-Thought Search Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning
May 7, 2026
Decoding-Time Debiasing via Process Reward Models Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation
May 4, 2026
Dual-Track CoT Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs
Apr 27, 2026
R-CoV R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs
Apr 22, 2026
CoT-PoT ensembling Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning
Apr 19, 2026
Atropos Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap
Apr 16, 2026
RefineRL RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning
Apr 1, 2026
Learning When to Sample Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning
Mar 17, 2026