AI CLFeb 27, 2025

An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

Kaustubh Vyas, Damien Graux, Sébastien Montella, Pavlos Vougiouklis, Ruofei Lai, Keshuang Li, Yang Ren, Jeff Z. Pan

arXiv:2502.20175v111.14 citationsh-index: 13Has Code

Originality Synthesis-oriented

AI Analysis

This work assesses LLMs for AI planning, which could impact automation and reasoning systems, but it is incremental as it focuses on benchmarking existing models.

The study evaluated the ability of large language models (LLMs) to understand and generate Planning Domain Definition Language (PDDL) for formal planning tasks, finding that some models performed well but others struggled with complex scenarios.

In recent advancements, large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning, laying the groundwork for tackling automatic formal planning tasks. This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL), an essential representation in artificial intelligence planning. We conduct an extensive analysis across 20 distinct models spanning 7 major LLM families, both commercial and open-source. Our comprehensive evaluation sheds light on the zero-shot LLM capabilities of parsing, generating, and reasoning with PDDL. Our findings indicate that while some models demonstrate notable effectiveness in handling PDDL, others pose limitations in more complex scenarios requiring nuanced planning knowledge. These results highlight the promise and current limitations of LLMs in formal planning tasks, offering insights into their application and guiding future efforts in AI-driven planning paradigms.

View on arXiv PDF

Similar