CLLGMar 18, 2025

Temporal Consistency for LLM Reasoning Process Error Identification

arXiv:2503.14495v19 citationsh-index: 15Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses verification challenges in mathematical reasoning for AI systems, representing an incremental improvement over existing methods.

The paper tackles the problem of verifying mathematical reasoning processes by introducing a temporal consistency method that iteratively refines judgments, achieving performance improvements across benchmarks and enabling smaller distilled models to outperform larger models and GPT-4o on ProcessBench.

Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1. Our codes are available at https://github.com/jcguo123/Temporal-Consistency

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes