SESep 26, 2025Code
A benchmark for vericoding: formally verified program synthesisSergiu Bursuc, Theodore Ehrenborg, Shaowei Lin et al.
We present and test the largest benchmark for vericoding, LLM-generation of formally verified code from formal specifications - in contrast to vibe coding, which generates potentially buggy code from a natural language description. Our benchmark contains 12,504 formal specifications, with 3,029 in Dafny, 2,334 in Verus/Rust and 7,141 in Lean. Of these, 6,174 are new unseen problems. We find vericoding success rates of 27% in Lean, 44% in Verus/Rust and 82% in Dafny using off-the-shelf LLMs. Adding natural-language descriptions does not significantly improve performance. We also find that LLM progress has improved progress on pure Dafny verification from 68% to 96% over the past year. The benchmark and vericoding results are shared at https://github.com/Beneficial-AI-Foundation/vericoding-benchmark
CRJan 30, 2017
Automated verification of dynamic root of trust protocols (long version)Sergiu Bursuc, Christian Johansen, Shiwei Xu
Automated verification of security protocols based on dynamic root of trust, typically relying on protected hardware such as TPM, involves several challenges that we address in this paper. We model the semantics of trusted computing platforms (including CPU, TPM, OS, and other essential components) and of associated protocols in a classical process calculus accepted by ProVerif. As part of the formalization effort, we introduce new equational theories for representing TPM specific platform states and dynamically loaded programs. Formal models for such an extensive set of features cannot be readily handled by ProVerif, due especially to the search space generated by unbounded extensions of TPM registers. In this context we introduce a transformation of the TPM process, that simplifies the structure of the search space for automated verification, while preserving the security properties of interest. This allows to run ProVerif on our proposed models, so we can derive automatically security guarantees for protocols running in a dynamic root of trust context.