SEAILGPLFeb 21, 2025

FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs

arXiv:2502.15217v15 citationsh-index: 10Has CodeMSR
Originality Synthesis-oriented
AI Analysis

This dataset addresses a gap for researchers and developers working on specification inference tools and AI-assisted software development, but it is incremental as it provides a new resource rather than a novel method.

The authors tackled the lack of standardized benchmarks for verifying formal specifications in C++ programs by creating FormalSpecCpp, a comprehensive dataset of C++ programs with preconditions and postconditions, which they made publicly available to advance research in program verification and AI-assisted software development.

FormalSpecCpp is a dataset designed to fill the gap in standardized benchmarks for verifying formal specifications in C++ programs. To the best of our knowledge, this is the first comprehensive collection of C++ programs with well-defined preconditions and postconditions. It provides a structured benchmark for evaluating specification inference tools and testing theaccuracy of generated specifications. Researchers and developers can use this dataset to benchmark specification inference tools,fine-tune Large Language Models (LLMs) for automated specification generation, and analyze the role of formal specifications in improving program verification and automated testing. By making this dataset publicly available, we aim to advance research in program verification, specification inference, and AI-assisted software development. The dataset and the code are available at https://github.com/MadhuNimmo/FormalSpecCpp.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes