LGMay 10, 2024

Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

Yao Lai, Jinxin Liu, David Z. Pan, Ping Luo

Tsinghua

arXiv:2405.06758v114.210 citationsh-index: 7Has CodeNIPS

Originality Incremental advance

AI Analysis

This work addresses hardware design bottlenecks for engineers and researchers by providing scalable methods to improve speed and reduce size in arithmetic units, though it appears incremental as it builds on existing reinforcement learning techniques.

The paper tackles the problem of optimizing arithmetic units (adders and multipliers) for hardware efficiency by formulating design tasks as single-player tree generation games using reinforcement learning. The approach discovers designs that reduce computational delay by up to 26% for adders and increase speed by up to 49% for multipliers compared to state-of-the-art methods.

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at https://bit.ly/ArithmeticTree. Codes are released at https://github.com/laiyao1/ArithmeticTree.

View on arXiv PDF Code

Similar