HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
This work addresses the problem of enhancing LLM performance in hardware design tasks for engineers and developers, representing an incremental advancement by combining existing techniques like ASTs and DFGs with a novel dual-retrieval mechanism.
The paper tackles the challenge of applying Large Language Models to real-world Hardware Description Language (HDL) projects with thousands of lines of code by proposing HDLxGraph, a framework that integrates Graph Retrieval Augmented Generation with HDL-specific graph representations, resulting in improvements of 12.04% in search accuracy, 12.22% in debugging efficiency, and 5.04% in completion quality compared to similarity-based methods.
Large Language Models (LLMs) have demonstrated their potential in hardware design tasks, such as Hardware Description Language (HDL) generation and debugging. Yet, their performance in real-world, repository-level HDL projects with thousands or even tens of thousands of code lines is hindered. To this end, we propose HDLxGraph, a novel framework that integrates Graph Retrieval Augmented Generation (Graph RAG) with LLMs, introducing HDL-specific graph representations by incorporating Abstract Syntax Trees (ASTs) and Data Flow Graphs (DFGs) to capture both code graph view and hardware graph view. HDLxGraph utilizes a dual-retrieval mechanism that not only mitigates the limited recall issues inherent in similarity-based semantic retrieval by incorporating structural information, but also enhances its extensibility to various real-world tasks by a task-specific retrieval finetuning. Additionally, to address the lack of comprehensive HDL search benchmarks, we introduce HDLSearch, a multi-granularity evaluation dataset derived from real-world repository-level projects. Experimental results demonstrate that HDLxGraph significantly improves average search accuracy, debugging efficiency and completion quality by 12.04%, 12.22% and 5.04% compared to similarity-based RAG, respectively. The code of HDLxGraph and collected HDLSearch benchmark are available at https://github.com/Nick-Zheng-Q/HDLxGraph.