Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware
This work addresses the problem of efficient SNN inference for edge computing applications, though it is incremental as it builds on existing SNN training methods and focuses on deployment optimizations.
The paper tackles the challenge of deploying spiking neural networks (SNNs) on resource-constrained hardware by presenting a lightweight C-based runtime and optimizations that reduce latency and memory without sacrificing accuracy, achieving ~10x speedups on desktop CPU and enabling microcontroller deployment.
Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional gains with pruning, together with large memory reductions that enable microcontroller deployment (Arduino Portenta H7). Results indicate that SNNs can be executed efficiently on conventional embedded platforms when paired with an optimized runtime and spike-driven model compression. Code: https://github.com/karol-jurzec/snn-generator/