AILOOct 3, 2025

Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

arXiv:2510.03469v15 citationsh-index: 82025 40th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)
AI Analysis

This work addresses plan verification for AI systems, but it is incremental as it builds on existing methods and datasets.

The paper tackles the problem of verifying natural language plans by converting them into formal structures using LLMs and model checking, achieving an F1 score of 96.3% with GPT-5 on a simplified dataset.

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes