AIMay 25, 2023

On the Planning Abilities of Large Language Models : A Critical Investigation

arXiv:2305.15771v2434 citations
Originality Incremental advance
AI Analysis

This work critically assesses claims about LLMs' reasoning capabilities, providing insights for AI researchers focused on planning and automation, though it is incremental in evaluating existing models.

The paper investigates the planning abilities of large language models (LLMs), finding that their autonomous plan generation is limited with GPT-4 achieving only a ~12% success rate, but they show more promise when used as heuristic guidance in LLM-Modulo settings to improve external planners.

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the LLM-Modulo setting show more promise. In the LLM-Modulo setting, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes