FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs
This dataset addresses a gap for researchers and developers working on specification inference tools and AI-assisted software development, but it is incremental as it provides a new resource rather than a novel method.
The authors tackled the lack of standardized benchmarks for verifying formal specifications in C++ programs by creating FormalSpecCpp, a comprehensive dataset of C++ programs with preconditions and postconditions, which they made publicly available to advance research in program verification and AI-assisted software development.
FormalSpecCpp is a dataset designed to fill the gap in standardized benchmarks for verifying formal specifications in C++ programs. To the best of our knowledge, this is the first comprehensive collection of C++ programs with well-defined preconditions and postconditions. It provides a structured benchmark for evaluating specification inference tools and testing theaccuracy of generated specifications. Researchers and developers can use this dataset to benchmark specification inference tools,fine-tune Large Language Models (LLMs) for automated specification generation, and analyze the role of formal specifications in improving program verification and automated testing. By making this dataset publicly available, we aim to advance research in program verification, specification inference, and AI-assisted software development. The dataset and the code are available at https://github.com/MadhuNimmo/FormalSpecCpp.