LGPLFeb 25, 2025

DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis

arXiv:2502.18297v124 citationsh-index: 112025 IEEE International Conference on LLM-Aided Design (ICLAD)
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for holistic resources in hardware design automation, enabling more nuanced ML applications for RTL tasks, though it is incremental as it builds on existing data types.

The paper introduces DeepCircuitX, a comprehensive repository-level dataset for RTL code understanding, generation, and PPA analysis, which includes multilevel RTL code, CoT annotations, and PPA metrics, and demonstrates its effectiveness through LLM finetuning and human evaluations.

This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes