Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning
This work addresses the challenge of evaluating LLM planning performance more broadly by automating prompt generation, though it is incremental as it builds on prior manual methods.
The authors tackled the problem of automating natural language prompt generation for LLM-based planning in PDDL domains, showing that their automated prompts achieve similar performance to manual ones and outperform PDDL and template-based prompts, with LLMs lagging behind symbolic planners but scaling better in some cases.
Large language models (LLMs) have revolutionized a large variety of NLP tasks. An active debate is to what extent they can do reasoning and planning. Prior work has assessed the latter in the specific context of PDDL planning, based on manually converting three PDDL domains into natural language (NL) prompts. Here we automate this conversion step, showing how to leverage an LLM to automatically generate NL prompts from PDDL input. Our automatically generated NL prompts result in similar LLM-planning performance as the previous manually generated ones. Beyond this, the automation enables us to run much larger experiments, providing for the first time a broad evaluation of LLM planning performance in PDDL. Our NL prompts yield better performance than PDDL prompts and simple template-based NL prompts. Compared to symbolic planners, LLM planning lags far behind; but in some domains, our best LLM configuration scales up further than A$^\star$ using LM-cut.