LGMay 19, 2025

An Overview of Arithmetic Adaptations for Inference of Convolutional Neural Networks on Re-configurable Hardware

Ilkay Wunderlich, Benjamin Koch, Sven Schönfeld

arXiv:2505.13575v17.12 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses deployment inefficiencies for CNNs on embedded platforms, but it is incremental as it applies existing optimization techniques to a specific hardware setup.

The paper tackles the challenge of deploying Convolutional Neural Networks (CNNs) on resource-constrained re-configurable hardware like FPGAs, presenting best practice approaches including batch normalization fusion, filter pruning, and post-training quantization for a TinyYOLOv3 detector on a XILINX Artix-7 FPGA.

Convolutional Neural Networks (CNNs) have gained high popularity as a tool for computer vision tasks and for that reason are used in various applications. There are many different concepts, like single shot detectors, that have been published for detecting objects in images or video streams. However, CNNs suffer from disadvantages regarding the deployment on embedded platforms such as re-configurable hardware like Field Programmable Gate Arrays (FPGAs). Due to the high computational intensity, memory requirements and arithmetic conditions, a variety of strategies for running CNNs on FPGAs have been developed. The following methods showcase our best practice approaches for a TinyYOLOv3 detector network on a XILINX Artix-7 FPGA using techniques like fusion of batch normalization, filter pruning and post training network quantization.

View on arXiv PDF Code

Similar