CRApr 30

SecGoal: A Benchmark for Security Goal Extraction and Formalization from Protocol Documents

arXiv:2604.2760165.3
Predicted impact top 25% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in formal verification and protocol analysis, this benchmark and framework address the bottleneck of automating security goal extraction from natural language, providing a reproducible baseline.

SecGoal introduces the first expert-annotated benchmark for security goal extraction from protocol documents, covering 15 protocols including 5G-AKA and TLS 1.3. Instruction-tuned 7B/9B models achieve F1-scores above 80%, significantly outperforming larger models like Gemini 2.5-Pro which have precision below 15%.

Formal verification provides rigorous guarantees for cryptographic security, yet automating the extraction and formalization of security goals from natural language protocol documents remains a major bottleneck, compounded by the scarcity of expert-annotated resources and integrated frameworks bridging unstructured text and symbolic logic. We introduce SecGoal, the first expert-annotated benchmark covering 15 widely deployed protocol documents, including 5G-AKA and TLS 1.3, and AIFG, an AI-assisted framework that decomposes the task into context-aware goal extraction and retrieval-augmented formalization. We conduct a comprehensive evaluation to assess whether contemporary LLMs are ready to automate this pipeline. Our results reveal a pronounced precision-recall imbalance: frontier models, such as Gemini 2.5-Pro, achieve high recall but precision below 15%, frequently misclassifying operational text as security goals. In contrast, instruction tuning on SecGoal enables compact models with 7B/9B parameters to achieve F1-scores above 80%, substantially outperforming larger general-purpose models. Our work establishes a foundational dataset and reproducible baseline for automated formal protocol analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes