ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models
For LLM users, ReTreVal provides a method to improve reasoning performance across problems without fine-tuning, enabling smaller models to compete with larger ones.
ReTreVal introduces a training-free inference-time reasoning framework that enables cross-problem learning by accumulating and revising strategy entries across problems, achieving 85.8% pass@1 on MATH-500 (+8.6 pp over baselines) and 54.4% on MMLU-Pro (+15.3 pp over Self-Refine).
Every existing inference-time reasoning framework discards all failure context at problem boundaries, leaving a model solving problem 500 no wiser than it was on problem 1. We present ReTreVal (Reasoning Tree with Validation), a training-free framework that closes this gap through adaptive tree exploration with tool-augmented node refinement, typed-failure backtracking that injects categorized error context into the recovered branch, and a self-rewriting memory that accumulates and revises strategy entries across problems, enabling inference-time cross-problem learning on any fixed, unmodified LLM without fine-tuning. ReTreVal achieves 85.8% pass@1 on MATH-500 (+8.6 pp over Zero-Shot CoT, +8.6 pp over the strongest baseline Self-Refine) and 54.4% on MMLU-Pro (+15.3 pp over Self-Refine), with a 3.4:1 win-to-regression ratio confirming genuine error recovery rather than noise. These capabilities, previously requiring gradient updates, allow a 32B model to compete with much larger single-pass systems.