Yang Zhao

h-index46

4papers

85citations

Novelty45%

AI Score49

Ranked #25,796 of 194,257 authors (top 13%)#6,168 in LG (top 15%)

4 Papers

18.1PLJul 15, 2024Code

CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization

Yang Zhao, Di Huang, Chongxiao Li et al.

The design flow of processors, particularly in hardware description languages (HDL) like Verilog and Chisel, is complex and costly. While recent advances in large language models (LLMs) have significantly improved coding tasks in software languages such as Python, their application in HDL generation remains limited due to the scarcity of high-quality HDL data. Traditional methods of adapting LLMs for hardware design rely on synthetic HDL datasets, which often suffer from low quality because even advanced LLMs like GPT perform poorly in the HDL domain. Moreover, these methods focus solely on chat tasks and the Verilog language, limiting their application scenarios. In this paper, we observe that: (1) HDL code collected from the real world is of higher quality than code generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing HDL code rather than generating it. (3) An explicit language tag can help LLMs better adapt to the target language when there is insufficient data. Based on these observations, we propose an efficient LLM fine-tuning pipeline for HDL generation that integrates a multi-level summarization data synthesis process with a novel Chat-FIM-Tag supervised fine-tuning method. The pipeline enhances the generation of HDL code from natural language descriptions and enables the handling of various tasks such as chat and infilling incomplete code. Utilizing this pipeline, we introduce CodeV, a series of HDL generation LLMs. Among them, CodeV-All not only possesses a more diverse range of language abilities, i.e. Verilog and Chisel, and a broader scope of tasks, i.e. Chat and fill-in-middle (FIM), but it also achieves performance on VerilogEval that is comparable to or even surpasses that of CodeV-Verilog fine-tuned on Verilog only, making them the first series of open-source LLMs designed for multi-scenario HDL generation.

5.6SEMar 17

SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications

Zhifei Chen, Lata Yi, Liming Nie et al.

Software development relies heavily on traceability links between various software artifacts to ensure quality and facilitate maintenance. While automated traceability recovery techniques have advanced for different artifact pairs, the field remains fragmented with an incomplete overview of artifact associations, ambiguous linking techniques, and fragmented knowledge of application scenarios. To bridge these gaps, we conducted a systematic literature review on software traceability recovery to synthesize the linked artifacts, recovery tools, and usage scenarios across the traceability ecosystem. First, we constructed the first global artifacts traceability graph of 23 associations among 22 artifact types, exposing a severe research imbalance that heavily favors code-related links. Second, while recovery techniques are shifting toward deep semantic models, a reproducibility crisis persists (e.g., only 37% of studies released code); to address this, we provided a comprehensive evaluation framework including a technical decision map and standardized benchmarks. Finally, we quantified an industrial adoption gap (i.e., 95% of tools remain confined to academia) and proposed a role-centric framework to dynamically align artifact paths with concrete engineering activities. This review contributes a coherent knowledge framework for artifacts traceability research, identifies current trends, and provides directions for future work.

29.3LGMay 30, 2025Code

QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation

Yaoyu Zhu, Di Huang, Hanqi Lyu et al.

Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code-NL-code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage "distill-then-RL" training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior state-of-the-art by 12~20%, while even exceeding the performance of 671B DeepSeek-R1 on RTLLM. We have released our model, training code, and dataset to facilitate research in EDA and LLM communities.

15.7LGJul 22, 2025Code

RealBench: Benchmarking Verilog Generation Models with Real-World IP Designs

Pengwei Jin, Di Huang, Chongxiao Li et al.

The automatic generation of Verilog code using Large Language Models (LLMs) has garnered significant interest in hardware design automation. However, existing benchmarks for evaluating LLMs in Verilog generation fall short in replicating real-world design workflows due to their designs' simplicity, inadequate design specifications, and less rigorous verification environments. To address these limitations, we present RealBench, the first benchmark aiming at real-world IP-level Verilog generation tasks. RealBench features complex, structured, real-world open-source IP designs, multi-modal and formatted design specifications, and rigorous verification environments, including 100% line coverage testbenches and a formal checker. It supports both module-level and system-level tasks, enabling comprehensive assessments of LLM capabilities. Evaluations on various LLMs and agents reveal that even one of the best-performing LLMs, o1-preview, achieves only a 13.3% pass@1 on module-level tasks and 0% on system-level tasks, highlighting the need for stronger Verilog generation models in the future. The benchmark is open-sourced at https://github.com/IPRC-DIP/RealBench.