Guarded Repair for Harm-Aware Post-hoc Replacement of LLM Mathematical Reasoning
For practitioners of LLM-based mathematical reasoning, this work provides a harm-aware selective replacement framework that significantly reduces the risk of breaking correct answers during post-hoc repair.
The paper tackles the problem of asymmetric risk in post-hoc repair of LLM mathematical reasoning, where fixing incorrect traces is useful but replacing correct ones is harmful. GuardedRepair improves accuracy from 95.60% to 96.89% on GSM8K and from 78.40% to 87.60% on ASDiv without breaking any correct answers in the main run.
Post-hoc repair of LLM mathematical reasoning introduces an asymmetric risk: fixing an incorrect reasoning trace is useful, but replacing a trace that was already correct can be harmful. We study this problem under a selective replacement setting, where a system must decide whether a repaired candidate is safer than preserving the original cached trace. We present GuardedRepair, a guarded best-of-N repair framework that diagnoses cached reasoning traces, selectively triggers repair, and accepts answer-changing candidates only when deterministic verification guards support replacement. The framework combines lightweight symbolic checks, surface semantic-risk diagnostics, bounded candidate generation, and conservative acceptance policies. On the full GSM8K test set, where the initial reasoner already achieves 95.60% accuracy, GuardedRepair improves final accuracy to 96.89%, fixing 17 of 58 remaining errors without measured broken-correct cases in the main run. On a weak-reasoner ASDiv setting, accuracy improves from 78.40% to 87.60%. Direct regeneration baselines show that this gain is not explained by stronger-model re-solving alone: re-solving all GSM8K examples lowers accuracy to 93.03% and breaks 47 initially correct answers. Additional analyses show that guarded repair substantially improves the fixed/broken tradeoff, while also revealing that replacement risk is reduced rather than eliminated. These results support viewing post-hoc repair as harm-aware selective replacement rather than unconstrained re-solving.