LGJun 3

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

Peilong Zhou, Zhirong Chen, Cangyuan Li, Haoyu Gao, Kaiyan Chang, Ziming Qu, Ying Wang

arXiv:2606.0525376.7

Predicted impact top 25% in LG · last 90 daysOriginality Highly original

AI Analysis

For hardware designers using LLMs for RTL generation, this work enables physically optimized designs beyond functional correctness, with significant PPA improvements.

The paper introduces TTT-RTL, the first per-design test-time training framework for LLM-based RTL optimization, which adapts the policy using EDA feedback. On RTLLM v2.0, it reduces geometric-mean PPA product by 65.1%, outperforming the best frozen-policy baseline at 26.1%.

Large language models (LLMs) have shown increasing promise in generating functionally correct register-transfer-level (RTL) hardware designs. Recent systems improve further through EDA-integrated reinforcement learning with syntax, simulation, and PPA rewards, but train a general RTL generator before deployment while test-time approaches search with a frozen policy. We instead perform reinforcement learning at test time, allowing the LLM policy to adapt to executable EDA feedback for the specific RTL problem at hand. We propose TTT-RTL, to our knowledge the first per-design test-time training framework that closes the loop between an LLM policy and an EDA pipeline for RTL optimization. TTT-RTL samples candidate implementations, verifies them through syntax checking and simulation, scores valid designs using synthesis-derived PPA product, reuses high-reward variants through a PUCT-indexed design-state pool, and updates the policy with an entropic policy-gradient objective. To stabilize policy updates under sparse or plateaued rewards, we introduce an adaptive KL-budget controller that adjusts the entropy constraint using reference KL, effective sample size, and reward saturation signals. On RTLLM v2.0 under Nangate 45nm, TTT-RTL reduces the geometric-mean PPA product by 65.1% over the reference, outperforming the strongest published frozen-policy agent baseline at 26.1%. On an industrial XuanTie C910 FPU leading-zero-anticipation unit under Sky130, TTT-RTL achieves a 59.4% ADP reduction, and ablations confirm that policy adaptation, state reuse, and KL-budget control each contribute. These results suggest that test-time training with executable EDA feedback can move LLM-based RTL generation beyond functional correctness toward physically optimized hardware.

View on arXiv PDF

Similar