CLMay 28

Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning

arXiv:2605.3024588.0

Predicted impact top 40% in CL · last 90 daysOriginality Highly original

AI Analysis

For researchers working on LLM reasoning, this work provides a novel paradigm that improves mathematical reasoning performance by explicitly addressing problem understanding, though it is incremental as it extends existing plan-based methods.

The paper introduces PPC, a framework that adds an explicit problem-understanding stage (preplan) before planning and execution in LLM mathematical reasoning, achieving best results on 39 of 40 metrics across four backbones and five benchmarks, with improvements of +2.23 (maj@16) and +3.06 (pass@16) over the strongest baseline without extra inference tokens.

Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examination reveals an inherent paradigm-level gap: both the planning and its execution stages decide how to solve a problem, while the prior question of what to solve; recognizing the problem type, the applicable tools, and the foreseeable pitfalls; remains entirely implicit. To bridge this gap, we propose PPC (Preplan-Plan-CoT), a framework that introduces an explicit problem-understanding stage, the preplan, yielding a new question $\rightarrow$ preplan $\rightarrow$ plan $\rightarrow$ cot paradigm. Realizing this paradigm requires safeguarding the conceptual integrity of preplan at both ends. Specifically, we design a three-stage synthesis pipeline with a spoiler-score detector that filters out leakage and spoiler failures to build clean preplan supervision, and a composite GRPO reward enforces that the generated plan genuinely follows from the preplan. Experiments across four backbones and five mathematical reasoning benchmarks show that PPC achieves the best results on 39 of 40 metrics, improving maj@16 and pass@16 by +2.23 and +3.06 over the strongest baseline without introducing additional inference token overhead.

View on arXiv PDF

Similar