Development of embedded target detection system based on FPGA and YOLOv3-Tiny
For developers of embedded AI systems, this work provides an efficient FPGA implementation of YOLOv3-Tiny with significant improvements in speed, power efficiency, and resource usage.
This paper presents an FPGA-based embedded target detection system using YOLOv3-Tiny, achieving 0.211s inference latency (75.58% faster), 10.11 GOPS/W power efficiency (29.45% better), and up to 51.94% reduction in hardware resource utilization compared to similar designs.
Computational complexity and storage requirements are crucial factors influencing the performance and efficiency of convolutional neural networks (CNNs) in resource-constrained environments. This paper presents a high-performance embedded target detection system based on FPGA and YOLOv3-Tiny, specifically designed for embedded artificial intelligence applications. By integrating lightweight CNN optimization techniques with hardware accelerator design, significant improvements are made in both computational efficiency and resource utilization. Key optimizations, including low-bit quantization, batch normalization fusion, and table lookup mapping, reduce model parameters and computational complexity. Additionally, an FPGA hardware accelerator with a pipelined architecture is developed to enhance the efficiency of convolution operations while minimizing off-chip data transmission through modular design and on-chip cache optimization. On the ZYNQ-XC7Z035 platform, the system achieves an inference latency of 0.211 seconds, outperforming comparable designs by 75.58% in speed. The system achieves an power efficiency of 10.11 GOPS/W, surpassing comparable designs by at least 29.45%. Furthermore, hardware resource utilization is reduced by up to 51.94% compared to similar systems. This study offers innovative design methodologies and practical application examples for the efficient deployment of deep learning models on embedded platforms.