SE LG NINov 22, 2025

Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation

Kuangxiangzi Liu, Dhiman Chakraborty, Alexander Liggesmeyer, Andreas Zeller

arXiv:2511.17977v13 citations

Originality Incremental advance

AI Analysis

This addresses the slow and error-prone process of manual test generation for safety-critical systems by automating the conversion of natural language to formal specs, though it is incremental as it builds on existing LLM and formal methods.

The paper tackles the problem of manually deriving test cases from natural language specifications for safety-critical systems by proposing a two-stage pipeline using LLMs to synthesize formal protocol specifications, which enables automated test generation; the prototype AUTOSPEC achieved an average recovery of 92.8% of client and 80.2% of server message types and 81.5% message acceptance across real-world internet protocols.

Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. _Formal_ specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses large language models (LLMs) to bridge the gap: First, we extract _protocol elements_ from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal _protocol specification_ from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an _inspectable specification_ that preserves traceability to the original text; 2. the generation of actual test cases _no longer requires an LLM_; 3. the resulting formal specs are _human-readable_, and can be reviewed, version-controlled, and incrementally refined; and 4. over time, we can build a _corpus_ of natural-language-to-formal-specification mappings that can be used to further train and refine LLMs for more automatic translations. Our prototype, AUTOSPEC, successfully demonstrated the feasibility of our approach on five widely used _internet protocols_ (SMTP, POP3, IMAP, FTP, and ManageSieve) by applying its methods on their _RFC specifications_ written in natural-language, and the recent _I/O grammar_ formalism for protocol specification and fuzzing. In its evaluation, AUTOSPEC recovers on average 92.8% of client and 80.2% of server message types, and achieves 81.5% message acceptance across diverse, real-world systems.

View on arXiv PDF

Similar