NE INS-DETMay 21

Quantization Effects of Artificial Neural Networks for Embedded Edge-Computing Applications

Alperen Aksoy, Ilja Bekman, Vesselin Dimitrov, Qader Dorosti, Chimezie Eguzo, Sarah Fleitmann, Christian Grewing, Fabian Hader, Andre Zambanini, Stefan van Waasen

arXiv:2511.054793.9h-index: 18

Predicted impact top 73% in NE · last 90 daysOriginality Incremental advance

AI Analysis

For developers of resource-constrained scientific applications (e.g., qubit calibration, particle detectors), this work provides practical quantization trade-offs and a novel BNN training method for ultra-low-latency inference.

This paper evaluates quantization techniques (PTQ, QAT, BNNs) for neural networks on embedded edge devices, achieving a four-fold memory reduction with PTQ while maintaining accuracy, and proposing a GA-based BNN training method that achieves 10-15 ns inference latency without specialized hardware.

This paper examines the use of Quantized Neural Networks (QNNs) for two resource-constrained scientific applications: automated calibration of semi-conductor quantum bits (qubits) and scientific particle detectors. We evaluate the trade-offs between Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), and ultra-low-bit Binary Neural Networks (BNNs) with respect to latency and resource usage. Our results demonstrate that PTQ achieves a four-fold reduction in memory usage for U-shaped CNN (U-Net) architectures while maintaining or slightly enhancing segmentation accuracy (e.g. from 89% to 90% for a small U-Net with 447 parameters). For the training of non-differentiable custom BNNs , we propose a novel, hardware-constrained learning approach using Genetic Algorithms (GAs). We showcase a LUT-based BNN architecture suitable for direct conversion to VHDL via the HCL4BNN framework. This method achieves nanosecond-scale inference latencies (10-15 ns) without requiring specialized DSP or BRAM resources.

View on arXiv PDF

Similar