Method Drift›LLM reasoning / chain-of-thought
PRM
PRM: Photometric Stereo based Large Reconstruction ModelLLM reasoning / chain-of-thought · first seen Dec 10, 2024
superseded — cited as a baseline and beaten by newer methods
7 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites PRM as a baseline.
“although PRMs can validate the output of LLMs more accurately than ORMs, they often require high quality annotated data”
— ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding“However, key challenges remain, such as the difficulty of obtaining high-quality labels and the limited effectiveness of current PRM approaches”
— More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty“In our experiments, we find, however, that even state-of-the-art PRMs can be miscalibrated, assigning overly optimistic scores---particularly on challenging, out-of-distribution problems.”
— Know What You Don't Know: Uncertainty Calibration of Process Reward Models“PRMs poorly approximate state values and reliability degrades with reasoning depth, suggesting credit assignment issues”
— Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs“Process Reward Models (PRMs) are bound by prohibitive annotation costs, while verifier-free proxies frequently yield sparse signals that lack awareness of the intermediate reasoning process.”
— Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models“PRM-based approaches require substantial computational resources for training step-level reward models and conducting multi-step inference processes”
— PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training“obtaining high-quality step-by-step annotations is challenging: current efforts relying on human annotation, Monte Carlo sampling, or LLM-as-a-judge are either costly or noisy.”
— Uncertainty-Aware Step-wise Verification with Generative Reward Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating PRM. Values are copied from the source paper's tables — verify against the cited paper.
- ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [Qwen3-8B]
0.689 vs 0.516
- ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [Qwen2.5-Coder-7B-Instruct]
0.673 vs 0.591
- ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [DS-Coder-6.7b-Instruct]
0.584 vs 0.539
- ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [OpenCI-DS-6.7B]
0.583 vs 0.532
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS (Value) beats PRM · Accuracy [GLM4]
22.9 vs 22.0
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS (Value) beats PRM · Accuracy [GPT-3.5-turbo]
20.2 vs 17.4
- SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
SCPRM beats PRM · Pairwise ranking accuracy [Qwen2.5-1.5B]
89.76 vs 87.71
- SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
SCPRM beats PRM · Pairwise ranking accuracy [Qwen3-4B]
92.91 vs 89.37
- SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
SCPRM beats PRM · Pairwise ranking accuracy [Llama3.1-8B]
95.34 vs 91.83
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 27, 2026
- Tree-of-ThoughtsTree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design PatternsMay 27, 2026
- May 22, 2026
- May 22, 2026
- Novelty-based Tree-of-Thought SearchNovelty-based Tree-of-Thought Search for LLM Reasoning and PlanningMay 7, 2026
- Decoding-Time Debiasing via Process Reward ModelsDecoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended GenerationMay 4, 2026
- Apr 27, 2026
- Apr 22, 2026
- CoT-PoT ensemblingSelf-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM ReasoningApr 19, 2026
- AtroposAtropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model HotswapApr 16, 2026
- Apr 1, 2026
- Learning When to SampleLearning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought ReasoningMar 17, 2026