AR AIMay 26

AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications

Yuchao Wu, Wenji Fang, Jing Wang, Wenkai Li, Ziyan Guo, Zhiyao Xie

arXiv:2605.2747283.9h-index: 11Has Code

Predicted impact top 1% in AR · last 90 daysOriginality Incremental advance

AI Analysis

For hardware verification engineers, this benchmark provides a more realistic evaluation of LLM-based assertion generation, addressing limitations of prior benchmarks with structured specifications and buggy RTL inputs.

AssertLLM2 introduces a benchmark for generating SystemVerilog Assertions from design specifications, featuring 83 real-world designs with buggy RTL variants to evaluate both bug-prevention and bug-hunting capabilities. It establishes rigorous baselines for LLMs, showing that current models achieve limited success in realistic settings.

Assertion-based verification (ABV) is a cornerstone of modern hardware design, yet manually translating design intent into formal SystemVerilog Assertions (SVAs) remains labor-intensive and error-prone. While Large Language Models (LLMs) show promise for automating this process, existing benchmarks remain limited by unrealistic task formulations, weak specification inputs, and oversimplified evaluation. To address these limitations, we introduce AssertLLM2, an open-source benchmark for realistic assertion generation in hardware verification. AssertLLM2 contains 83 real-world designs across 13 functional categories. For each design, the benchmark provides a structured design specification, a verified dependency-complete golden RTL, and systematically mutated buggy RTL variants. These support two practical settings: bug-prevention, where assertions are generated from specifications to guard against design errors, and bug-hunting, where assertions are generated to expose discrepancies between intended behavior and faulty implementations. To the best of our knowledge, AssertLLM2 is the first benchmark to explicitly use buggy RTL as input to evaluate bug-detection capability. AssertLLM2 further adopts a more rigorous evaluation framework spanning syntactic validity, formal provability, coverage, and mutation-based bug detection. Our benchmark enables a more realistic and extensive assessment of assertion generation and establishes rigorous baselines for state-of-the-art LLMs in practical hardware verification.

View on arXiv PDF

Similar