LGARPFPLApr 12, 2023

MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

arXiv:2304.05544v15 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient inference runtimes for TinyML systems, which is crucial for resource-constrained microcontrollers, but it is incremental as it builds on existing optimization techniques.

The paper tackles the problem of minimizing external memory accesses for matrix multiplication in TinyML on microcontrollers by introducing the MEMA framework, which analytically determines optimized schedules and kernels, resulting in up to a 1.8x speedup and 44% energy reduction compared to CMSIS-NN on ARM Cortex-M4.

We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in the current practice, that is, optimal schedules tend to be found only through a time consuming and heuristic search of a large scheduling space. We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems. For example, for neural network benchmarks on the ARM Cortex-M4, we achieve up to a 1.8x speedup and 44% energy reduction over CMSIS-NN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes