PRM (LLM reasoning / chain-of-thought): superseded — cited as a baseline and beaten by newer methods. 7 paper(s) critique it, 3 beat it on benchmarks — #6 of 772 most-superseded. Sub-problem: cluster led by Chain-of-Thought. Newer alternatives in the same sub-problem include Marginal Sharpening, Tree-of-Thoughts, Co-ReAct, MA-CoT, Novelty-based Tree-of-Thought Search.

Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#6 of 772 most-superseded

PRM

PRM: Photometric Stereo based Large Reconstruction Model

LLM reasoning / chain-of-thought · first seen Dec 10, 2024

superseded — cited as a baseline and beaten by newer methods

7 papers critique it · 3 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites PRM as a baseline.

“although PRMs can validate the output of LLMs more accurately than ORMs, they often require high quality annotated data”
— ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
“However, key challenges remain, such as the difficulty of obtaining high-quality labels and the limited effectiveness of current PRM approaches”
— More Bang for the Buck: Process Reward Modeling with Entropy-Driven Uncertainty
“In our experiments, we find, however, that even state-of-the-art PRMs can be miscalibrated, assigning overly optimistic scores---particularly on challenging, out-of-distribution problems.”
— Know What You Don't Know: Uncertainty Calibration of Process Reward Models
“PRMs poorly approximate state values and reliability degrades with reasoning depth, suggesting credit assignment issues”
— Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
“Process Reward Models (PRMs) are bound by prohibitive annotation costs, while verifier-free proxies frequently yield sparse signals that lack awareness of the intermediate reasoning process.”
— Efficient Paths and Dense Rewards: Probabilistic Flow Reasoning for Large Language Models
“PRM-based approaches require substantial computational resources for training step-level reward models and conducting multi-step inference processes”
— PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
“obtaining high-quality step-by-step annotations is challenging: current efforts relying on human annotation, Monte Carlo sampling, or LLM-as-a-judge are either costly or noisy.”
— Uncertainty-Aware Step-wise Verification with Generative Reward Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating PRM. Values are copied from the source paper's tables — verify against the cited paper.

ReST-RL beats PRM · Average [Qwen3-8B]
0.689 vs 0.516
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [Qwen2.5-Coder-7B-Instruct]
0.673 vs 0.591
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [DS-Coder-6.7b-Instruct]
0.584 vs 0.539
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-RL beats PRM · Average [OpenCI-DS-6.7B]
0.583 vs 0.532
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
ReST-MCTS (Value) beats PRM · Accuracy [GLM4]
22.9 vs 22.0
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
ReST-MCTS (Value) beats PRM · Accuracy [GPT-3.5-turbo]
20.2 vs 17.4
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
SCPRM beats PRM · Pairwise ranking accuracy [Qwen2.5-1.5B]
89.76 vs 87.71
SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
SCPRM beats PRM · Pairwise ranking accuracy [Qwen3-4B]
92.91 vs 89.37
SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering
SCPRM beats PRM · Pairwise ranking accuracy [Llama3.1-8B]
95.34 vs 91.83
SCPRM: A Schema-aware Cumulative Process Reward Model for Knowledge Graph Question Answering

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.