ARApr 18

From Natural Language to Silicon: The Representation Bottleneck in LLM Hardware Design

Weimin Fu, Zeng Wang, Minghao Shao, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri, Muhammad Shafique, Xiaolong Guo

arXiv:2604.1709778.61 citationsh-index: 22

AI Analysis

For domain engineers seeking to use LLMs for custom hardware design, this work identifies the representation bottleneck as the key obstacle, showing that IR choice matters more than model choice.

LLMs can generate FPGA designs from natural language, but the intermediate representation (IR) is the dominant factor for success, not the LLM model. Simulation pass rates vary from 3% to 88% across IRs, while within an IR, models vary less than 1.25x; LLM designs achieve 86.5% FPGA pass rate vs. 68.7% for reference solutions due to simplicity bias.

Edge applications increasingly demand custom hardware, yet Field-Programmable Gate Array (FPGA) design requires expertise that domain engineers lack. Large Language Models (LLMs) promise to bridge this gap through zero-knowledge hardware programming, where users describe circuits in natural language and an LLM compiles them to a hardware intermediate representation (IR) targeting silicon. Modeling this flow as a cascade of binary filters, this work demonstrates that IR choice, not model choice, is the dominant factor governing end-to-end success, a phenomenon termed the representation bottleneck. An evaluation of three frontier LLMs across six IRs spanning Verilog, VHDL, Chisel, Bluespec, PyMTL3, and HLS C on 202 tasks through a pipeline of compilation, simulation, FPGA synthesis on a Lattice iCE40UP5K, and LLM-based repair shows that simulation pass rates range from 3% to 88% across IRs but typically vary less than 1.25x across models within any single IR. On the resource-constrained iCE40, LLM designs achieve a higher conditional FPGA pass rate than reference solutions, 86.5% vs. 68.7%, not because they are better but because a simplicity bias makes them small enough to fit. The analysis reveals an accessibility-competence paradox: the most user-friendly IRs yield the worst LLM performance, suggesting that optimal IR selection will evolve as LLM capabilities grow.

View on arXiv PDF

Similar