A Training Data Recipe to Accelerate A* Search with Language Models
This work addresses the computational efficiency of combining LLMs with heuristic search for reasoning tasks, offering a domain-specific improvement.
The paper tackled the problem of accelerating A* search with language models by investigating coreset selection for training data, finding that both A* and LLMs require accurate predictions on nodes near the goal, and resulted in up to 15x fewer iterations and 5x wall-clock speed-up in search on planning domains.
Combining Large Language Models (LLMs) with heuristic search algorithms like A* holds the promise of enhanced LLM reasoning and scalable inference. To accelerate training and reduce computational demands, we investigate the coreset selection problem for the training data of LLM heuristic learning. Few methods to learn the heuristic functions consider the interaction between the search algorithm and the machine learning model. In this work, we empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Surprisingly, we find an overlap between their requirements; A* requires more accurate predictions on search nodes near the goal, and LLMs need the same set of nodes for effective generalisation. With these insights, we derive a data-selection distribution for learning LLM-based heuristics. On three classical planning domains, maze navigation, Sokoban and sliding tile puzzles, our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x. The codebase is at https://github.com/devaansh100/a_star.