LGMay 19, 2025

An Overview of Arithmetic Adaptations for Inference of Convolutional Neural Networks on Re-configurable Hardware

arXiv:2505.13575v12 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses deployment inefficiencies for CNNs on embedded platforms, but it is incremental as it applies existing optimization techniques to a specific hardware setup.

The paper tackles the challenge of deploying Convolutional Neural Networks (CNNs) on resource-constrained re-configurable hardware like FPGAs, presenting best practice approaches including batch normalization fusion, filter pruning, and post-training quantization for a TinyYOLOv3 detector on a XILINX Artix-7 FPGA.

Convolutional Neural Networks (CNNs) have gained high popularity as a tool for computer vision tasks and for that reason are used in various applications. There are many different concepts, like single shot detectors, that have been published for detecting objects in images or video streams. However, CNNs suffer from disadvantages regarding the deployment on embedded platforms such as re-configurable hardware like Field Programmable Gate Arrays (FPGAs). Due to the high computational intensity, memory requirements and arithmetic conditions, a variety of strategies for running CNNs on FPGAs have been developed. The following methods showcase our best practice approaches for a TinyYOLOv3 detector network on a XILINX Artix-7 FPGA using techniques like fusion of batch normalization, filter pruning and post training network quantization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes