Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications
It addresses the problem of efficiently running deep learning on resource-constrained devices for embedded systems applications, but is incremental as it is a review paper.
This survey reviews quantization techniques for deploying neural networks on microcontrollers, analyzing trade-offs between model performance and hardware constraints, and evaluating existing software and hardware platforms.
The deployment of Quantized Neural Networks (QNNs) on resource-constrained devices, such as microcontrollers, has introduced significant challenges in balancing model performance, computational complexity, and memory constraints. Tiny Machine Learning (TinyML) addresses these issues by integrating advancements across machine learning algorithms, hardware acceleration, and software optimization to efficiently run deep neural networks on embedded systems. This survey presents a hardware-centric introduction to quantization, systematically reviewing essential quantization techniques employed to accelerate deep learning models for embedded applications. In particular, further emphasis is placed on the critical trade-offs between model performance and hardware capabilities. The survey further evaluates existing software frameworks and hardware platforms designed specifically for supporting QNN execution on microcontrollers. Moreover, we provide an analysis of the current challenges and an outline of promising future directions in the rapidly evolving domain of QNN deployment.