Method Drift›LLM reasoning / chain-of-thought
Superseded baseline#25 of 772 most-superseded
EurusPRM-Stage1
LLM reasoning / chain-of-thought
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 2 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating EurusPRM-Stage1. Values are copied from the source paper's tables — verify against the cited paper.
- GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning
GroundedPRM beats EurusPRM-Stage1 · F1 score [auto-labeled supervision at 40K samples]
39.7 vs 31.2
- GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
GenPRM-7B (Maj@8) beats EurusPRM-Stage1 · Avg. [PRMs (7-8B)]
80.5 vs 31.2
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 3, 2026
- May 2, 2026
- Apr 19, 2026
- DC-W2SDC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological ReasoningMar 9, 2026
- Feb 9, 2026
- Jan 29, 2026
- Noise-Aware Iterative Training (NAIT)Towards Robust Process Reward Modeling via Noise-aware LearningJan 19, 2026
- GroundedPRMGroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level ReasoningOct 16, 2025
- group-relative advantage reinforcement learningBoosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QASep 30, 2025