PROC2PDDL: Open-Domain Planning Representations from Texts
This addresses the problem of AI planning in text-based environments for researchers by providing a challenging benchmark, though it is incremental as it builds on existing methods with new data.
The authors tackled the challenge of generating planning domain definitions from open-domain procedural texts by introducing Proc2PDDL, a dataset with expert-annotated PDDL representations, and found that state-of-the-art models like GPT-3.5 and GPT-4 achieved success rates close to 0% and around 35%, respectively.
Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.