Learning to Reason Efficiently with A* Post-Training

Andreas Opedal, Francesco Ignazio Re, Abulhair Saparov, Mrinmaya Sachan, Bernhard Schölkopf, Ryan Cotterell

arXiv:2605.2459744.3

Predicted impact top 6% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For LLM reasoning tasks, this work provides a method to improve both correctness and efficiency by integrating classical search algorithms into training, though it is incremental as it combines existing techniques.

The authors frame natural language inference as a search problem and use A* search to guide LLMs in generating correct and efficient proofs. Llama-3.2 models (1B-3B) improved from near-zero accuracy to outperforming DeepSeek-V3.2, with A*-informed signals balancing accuracy and efficiency.

Many applications of large language models (LLMs) require deductive reasoning, yet models frequently produce incorrect or redundant inference steps. We frame natural language inference as a search problem where the final answer is the valid proof itself, requiring a reasoning procedure in which intermediate inferences are correct. Specifically, we investigate whether LLMs can learn to generate correct and efficient proofs with guidance from A* search -- an algorithm that guarantees an optimally efficient path to a goal. We explore two training techniques: supervised fine-tuning on execution traces from A* and reinforcement learning with A*-informed process reward models. Empirically, we find that Llama-3.2 models in the 1B--3B range benefit substantially from A* post training, going from near-zero accuracy to outperforming DeepSeek-V3.2 -- a much larger model. Our analysis uncovers a trade-off: while simple correctness rewards maximize accuracy, A*-informed signals strike a balance between accuracy and efficiency. Furthermore, we find that on larger search spaces, models trained with imperfect heuristics exhibit superior accuracy. Our results demonstrate a promising direction towards reasoning guided by principles derived from classical search algorithms.

View on arXiv PDF

Similar