Tianyun Ma

AR
6papers
54citations
Novelty53%
AI Score48

6 Papers

PLJul 15, 2024Code
CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization

Yang Zhao, Di Huang, Chongxiao Li et al.

The design flow of processors, particularly in hardware description languages (HDL) like Verilog and Chisel, is complex and costly. While recent advances in large language models (LLMs) have significantly improved coding tasks in software languages such as Python, their application in HDL generation remains limited due to the scarcity of high-quality HDL data. Traditional methods of adapting LLMs for hardware design rely on synthetic HDL datasets, which often suffer from low quality because even advanced LLMs like GPT perform poorly in the HDL domain. Moreover, these methods focus solely on chat tasks and the Verilog language, limiting their application scenarios. In this paper, we observe that: (1) HDL code collected from the real world is of higher quality than code generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing HDL code rather than generating it. (3) An explicit language tag can help LLMs better adapt to the target language when there is insufficient data. Based on these observations, we propose an efficient LLM fine-tuning pipeline for HDL generation that integrates a multi-level summarization data synthesis process with a novel Chat-FIM-Tag supervised fine-tuning method. The pipeline enhances the generation of HDL code from natural language descriptions and enables the handling of various tasks such as chat and infilling incomplete code. Utilizing this pipeline, we introduce CodeV, a series of HDL generation LLMs. Among them, CodeV-All not only possesses a more diverse range of language abilities, i.e. Verilog and Chisel, and a broader scope of tasks, i.e. Chat and fill-in-middle (FIM), but it also achieves performance on VerilogEval that is comparable to or even surpasses that of CodeV-Verilog fine-tuned on Verilog only, making them the first series of open-source LLMs designed for multi-scenario HDL generation.

CRJul 12, 2024
TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Husheng Han, Xinyao Zheng, Yuanbo Wen et al.

Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.

90.6LGApr 20
AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning

Chongxiao Li, Pengwei Jin, Di Huang et al.

Performance, power, and area (PPA) optimization is a fundamental task in RTL design, requiring a precise understanding of circuit functionality and the relationship between circuit structures and PPA metrics. Recent studies attempt to automate this process using LLMs, but neither feedback-based nor knowledge-based methods are efficient enough, as they either design without any prior knowledge or rely heavily on human-summarized optimization rules. In this paper, we propose AutoPPA, a fully automated PPA optimization framework. The key idea is to automatically generate optimization rules that enhance the search for optimal solutions. To do this, AutoPPA employs an Explore-Evaluate-Induce ($E^2I$) workflow that contrasts and abstracts rules from diverse generated code pairs rather than manually defined prior knowledge, yielding better optimization patterns. To make the abstracted rules more generalizable, AutoPPA employs an adaptive multi-step search framework that adopts the most effective rules for a given circuit. Experiments show that AutoPPA outperforms both the manual optimization and the state-of-the-art methods SymRTLO and RTLRewriter.

18.8ARApr 17
Pecker: Bug Localization Framework for Sequential Designs via Causal Chain Reconstruction

Jiaping Tang, Jianan Mu, Tianyun Ma et al.

Debugging represents a time-consuming and labor-intensive task in hardware design, with bug localization constituting a substantial portion of this process. While spectrum-based bug localization techniques have achieved remarkable success in software domains and shown promise for hardware description languages, their effectiveness severely degrades in sequential designs. Unlike software programs, hardware designs exhibit intrinsic temporal characteristics that create fundamental challenges: timing misalignment between bug activation and observation, and progressive error propagation through state elements that obscures the root cause. To address these limitations, we propose Pecker, a novel bug localization framework that reconstructs the broken causal chain in sequential designs. Our approach introduces two key innovations: temporal backtracking using Estimated Minimal Propagation Cycles to identify potential activation cycles, strategic trace pruning to eliminate state pollution effects. We evaluate Pecker on comprehensive benchmarks comprising both combinational and sequential circuits. Experimental results demonstrate that Pecker effectively localizes 51%/80%/85% bugs within Top-1/3/5 ranks respectively, significantly outperforming state-of-the-art techniques. Notably, Pecker maintains robust performance across circuit complexities while existing methods exhibit severe degradation on sequential designs.

69.9ARMar 29
RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning

Xinyu Zhang, Zhiteng Chao, Yonghao Wang et al.

Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correctness and diversity. Our Diversity-Centric Multi-Objective Reward Scheduling integrates expert knowledge with EDA feedback, and a three-stage framework maximizes the utility of limited data. Experiments on the RTLLM benchmark show that RTLSeek surpasses prior methods, with ablation results confirming that encouraging broader design-space exploration improves RTL quality and achieves the principle of "the more generated, the better results." Implementation framework, including the dataset, source code, and model weights, is shown at https://anonymous.4open.science/r/DAC2026ID71-ACB4/.

LGSep 4, 2021
Eden: A Unified Environment Framework for Booming Reinforcement Learning Algorithms

Ruizhi Chen, Xiaoyu Wu, Yansong Pan et al.

With AlphaGo defeats top human players, reinforcement learning(RL) algorithms have gradually become the code-base of building stronger artificial intelligence(AI). The RL algorithm design firstly needs to adapt to the specific environment, so the designed environment guides the rapid and profound development of RL algorithms. However, the existing environments, which can be divided into real world games and customized toy environments, have obvious shortcomings. For real world games, it is designed for human entertainment, and too much difficult for most of RL researchers. For customized toy environments, there is no widely accepted unified evaluation standard for all RL algorithms. Therefore, we introduce the first virtual user-friendly environment framework for RL. In this framework, the environment can be easily configured to realize all kinds of RL tasks in the mainstream research. Then all the mainstream state-of-the-art(SOTA) RL algorithms can be conveniently evaluated and compared. Therefore, our contributions mainly includes the following aspects: 1.single configured environment for all classification of SOTA RL algorithms; 2.combined environment of more than one classification RL algorithms; 3.the evaluation standard for all kinds of RL algorithms. With all these efforts, a possibility for breeding an AI with capability of general competency in a variety of tasks is provided, and maybe it will open up a new chapter for AI.