Is VersaPRM superseded?

VersaPRM (LLM reasoning / chain-of-thought): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 1 beat it on benchmarks — #20 of 772 most-superseded. Sub-problem: cluster led by ORM. Newer alternatives in the same sub-problem include SCI-PRM, GR-Ben, MedPRMBench, DC-W2S, CoTZero.

Method Drift›LLM reasoning / chain-of-thought

Superseded baseline#20 of 772 most-superseded

VersaPRM

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

LLM reasoning / chain-of-thought · first seen Feb 10, 2025

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites VersaPRM as a baseline.

“However, it still draws on an annotation procedure that employs an LLM as a step-level judge, making it potentially error-prone, and uses only three labels (good, neutral, bad).”
— Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards
“there exists a substantial performance gap between these PRMs error-identification capability in general reasoning domains and that in mathematical domains”
— GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

Beaten on benchmarks

Head-to-head results where a newer method reports beating VersaPRM. Values are copied from the source paper's tables — verify against the cited paper.

BoN w/ Full Set beats VersaPRM · Average F1 [All cell lines]
68.50 vs 57.16
DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.