Method Drift›LLM reasoning / chain-of-thought
Tracked
Noise-Aware Iterative Training (NAIT)
Towards Robust Process Reward Modeling via Noise-aware LearningLLM reasoning / chain-of-thought · first seen Jan 19, 2026
current frontier — recent, not yet superseded in the knowledge base
0 papers critique it · 0 beat it on benchmarks
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 3, 2026
- May 2, 2026
- Apr 19, 2026
- DC-W2SDC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological ReasoningMar 9, 2026
- Feb 9, 2026
- Jan 29, 2026
- Noise-Aware Iterative Training (NAIT)Towards Robust Process Reward Modeling via Noise-aware LearningJan 19, 2026
- GroundedPRMGroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level ReasoningOct 16, 2025
- group-relative advantage reinforcement learningBoosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QASep 30, 2025