Evaluation of Convolution Primitives for Embedded Neural Networks on 32-bit Microcontrollers
This work addresses the problem of efficient neural network inference for embedded systems developers, but it is incremental as it focuses on implementing and benchmarking existing primitives rather than introducing new ones.
The paper tackled the challenge of deploying neural networks on 32-bit microcontrollers by implementing and evaluating various convolution primitives, revealing a linear relationship between theoretical MACs and energy consumption and showing advantages like reduced latency and energy with efficient primitives such as shift convolution.
Deploying neural networks on constrained hardware platforms such as 32-bit microcontrollers is a challenging task because of the large memory, computing and energy requirements of their inference process. To tackle these issues, several convolution primitives have been proposed to make the standard convolution more computationally efficient. However, few of these primitives are really implemented for 32-bit microcontrollers. In this work, we collect different state-of-the-art convolutional primitives and propose an implementation for ARM Cortex-M processor family with an open source deployment platform (NNoM). Then, we carry out experimental characterization tests on these implementations. Our benchmark reveals a linear relationship between theoretical MACs and energy consumption. Thus showing the advantages of using computationally efficient primitives like shift convolution. We discuss about the significant reduction in latency and energy consumption due to the use of SIMD instructions and highlight the importance of data reuse in those performance gains. For reproducibility purpose and further experiments, codes and experiments are publicly available.