FPGA deep learning acceleration based on convolutional neural network
This work provides an incremental improvement in energy efficiency for deep learning acceleration on FPGAs, benefiting researchers and practitioners working on embedded or low-power AI systems.
This paper addresses the computational intensity of Convolutional Neural Networks (CNNs) by proposing an FPGA-based hardware accelerator. The accelerator achieves an energy efficiency ratio of 32.73 GOPS/W, which is 34% higher than existing solutions, and a performance of 317.86 GOPS.
In view of the large amount of calculation and long calculation time of convolutional neural network (CNN), this paper proposes a convolutional neural network hardware accelerator based on field programmable logic gate array (FPGA). First, through in-depth analysis of the forward operation principle of the convolutional layer and exploration of the parallelism of the convolutional layer operation, a hardware architecture of input channel parallelism, output channel parallelism and convolution window deep pipeline is designed. Then in the above architecture, a fully parallel multiplication-addition tree module is designed to accelerate the convolution operation and an efficient window buffer module to implement the pipeline operation of the convolution window. The final experimental results show that the energy efficiency ratio of the accelerator proposed in this article reaches 32.73 GOPS/W, which is 34% higher than the existing solution, and the performance reaches 317.86 GOPS.