75.4ARMar 14Code
Retrieve, Schedule, Reflect: LLM Agents for Chip QoR OptimizationYikang ouyang, Yang Luo, Dongsheng Zuo et al.
Modern chip design requires multi-objective optimization of timing, power, and area under stringent time-to-market constraints. Although powerful optimization algorithms are integrated into EDA tools, achieving high QoR hinges on effective long-horizon scheduling, which relies heavily on manual expert intervention. To address this issue and automate chip design, we propose an agentic LLM framework that schedules chip optimizations through direct interaction with EDA tools. The agent is grounded in natural language expertise expressed as a search tree through retrieval-augmented generation (RAG). We further improve scheduling quality with Pareto-driven QoR feedback through language reflection. Experimental results show that, compared with black-box search methods such as reinforcement learning, our framework achieves 10% greater timing improvement while consuming less power and area, with more than 4x speedup. The post-optimization QoR is also comparable to that achieved by human experts. Finally, the agent supports customized tasks expressed in natural language, enabling preferential QoR trade-offs. The code and chip design data will be publicly available at https://github.com/YiKangOY/Open-LLM-ECO.
ARMar 31, 2024
RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space ReductionDongsheng Zuo, Jiadong Zhu, Yikang Ouyang et al.
Multiplication is a fundamental operation in many applications, and multipliers are widely adopted in various circuits. However, optimizing multipliers is challenging due to the extensive design space. In this paper, we propose a multiplier design optimization framework based on reinforcement learning. We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network. The agent optimizes the multiplier structure using a Pareto-driven reward customized to balance area and delay. Furthermore, we enhance the original framework with parallel reinforcement learning and design space pruning techniques and extend its capability to optimize fused multiply-accumulate (MAC) designs. Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay. The performance gain is further validated by comparing the area, power, and delay of processing element arrays using multipliers from our approach and baseline approaches.