PL AR LGJul 3, 2025

DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs

Mohammad Akyash, Kimia Azar, Hadi Kamali

arXiv:2507.02226v19 citationsh-index: 5Has Code2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Originality Incremental advance

AI Analysis

This addresses the challenge of reliable RTL code generation for hardware design automation, representing a domain-specific incremental improvement.

The paper tackles the problem of LLMs generating invalid RTL code by proposing DecoRTL, a run-time decoding framework that improves syntactic validity, functional correctness, and output diversity with imperceptible execution overhead.

As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation. However, conventional LLM decoding strategies, originally designed for natural language, often fail to meet the structural and semantic demands of RTL, leading to hallucinated, repetitive, or invalid code outputs. In this paper, we first investigate the root causes of these decoding failures through an empirical analysis of token-level entropy during RTL generation. Our findings reveal that LLMs exhibit low confidence in regions of structural ambiguity or semantic complexity, showing that standard decoding strategies fail to differentiate between regions requiring determinism (syntax-critical regions) and those that benefit from creative exploratory variability (design-critical regions). Then, to overcome this, we introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation. DecoRTL integrates two complementary components: (i) self-consistency sampling, which generates multiple candidates and re-ranks them based on token-level agreement to promote correctness while maintaining diversity; and (ii) syntax-aware temperature adaptation, which classifies tokens by their syntactical and functional roles and adjusts the sampling temperature accordingly, enforcing low temperature for syntax-critical tokens and higher temperature for exploratory ones. Our approach operates entirely at inference time without requiring any additional model fine-tuning. Through evaluations on multiple open-source LLMs using the VerilogEval benchmark, we demonstrate significant improvements in syntactic validity, functional correctness, and output diversity, while the execution overhead (performance overhead) is imperceptible.

View on arXiv PDF

Similar