AI LOOct 3, 2025

Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

Keshav Ramani, Vali Tawosi, Salwa Alamir, Daniel Borrajo

arXiv:2510.03469v19.65 citationsh-index: 82025 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)

Originality Incremental advance

AI Analysis

This work addresses plan verification for AI systems, but it is incremental as it builds on existing methods and datasets.

The paper tackles the problem of verifying natural language plans by converting them into formal structures using LLMs and model checking, achieving an F1 score of 96.3% with GPT-5 on a simplified dataset.

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

View on arXiv PDF

Similar