LGARMar 28, 2025

Benchmarking Ultra-Low-Power $μ$NPUs

arXiv:2503.22567v37 citationsh-index: 16Has CodeMOBICOM
Originality Synthesis-oriented
AI Analysis

This work provides a foundation for evaluating μNPU platforms, offering practical insights for hardware and software developers in the ultra-low-power domain, but it is incremental as it focuses on benchmarking rather than introducing new methods.

The authors tackled the problem of benchmarking ultra-low-power neural processing units (μNPUs) for on-device inference by conducting the first comparative evaluation of commercially available platforms, revealing unexpected performance disparities and scaling behaviors with model complexity.

Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has sparked recent development of microcontroller-scale NN accelerators, also known as neural processing units ($μ$NPUs), designed specifically for ultra-low-power applications. We present the first comparative evaluation of a number of commercially-available $μ$NPUs, including the first independent benchmarks for multiple platforms. To ensure fairness, we develop and open-source a model compilation pipeline supporting consistent benchmarking of quantized models across diverse microcontroller hardware. Our resulting analysis uncovers both expected performance trends as well as surprising disparities between hardware specifications and actual performance, including certain $μ$NPUs exhibiting unexpected scaling behaviors with model complexity. This work provides a foundation for ongoing evaluation of $μ$NPU platforms, alongside offering practical insights for both hardware and software developers in this rapidly evolving space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes