Method Drift›LLM reasoning / chain-of-thought
Superseded baseline#752 of 772 most-superseded
visualprm
VisualPRM: An Effective Process Reward Model for Multimodal ReasoningLLM reasoning / chain-of-thought · first seen Mar 13, 2025
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites visualprm as a baseline.
“visualprm relied primarily on MC-score-based dataset construction, which strongly influenced PRM performance. Our central question is whether superior performance can be achieved by using stronger VLMs to judge each reasoning step, and subsequently employing these judgments as supervision signals for training VL-PRMs.”
— Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Verifiable Process Reward Models (VPRMs)Beyond Outcome Verification: Verifiable Process Reward Models for Structured ReasoningJan 23, 2026
- perception-focused supervisionTraining Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons LearnedSep 27, 2025