AR AIFeb 26

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

arXiv:2602.23334v11 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work provides a solution for efficiently executing mixed-precision QNNs on edge device hardware accelerators, which is important for deploying accurate and resource-efficient AI models.

This paper addresses the challenge of supporting multi-precision quantized neural networks (QNNs) on hardware accelerators by proposing a runtime-reconfigurable bitwise systolic array architecture. The proposed design achieves a speedup of 1.3185 to 3.5671 times for mixed-precision model inference and supports a higher clock frequency of 250MHz.

Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185 to 3.5671 times speedup in inferring mixed-precision models and has less critical path delay, supporting a higher clock frequency (250MHz).

View on arXiv PDF

Similar