SDLGSep 8, 2025

End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers

arXiv:2509.07051v11 citationsh-index: 4SENSORS
Originality Incremental advance
AI Analysis

This work addresses efficient, low-latency KWS for embedded and IoT devices, offering incremental improvements through system-level optimization.

The paper tackled the challenge of deploying keyword spotting (KWS) on embedded microcontrollers with strict memory and energy constraints by evaluating lightweight neural networks and proposing TKWS, which achieved up to 92.4% F1-score with only 14.4k parameters. It emphasized that real-world effectiveness depends on optimizing the entire processing pipeline and hardware, not just model accuracy.

Keyword spotting (KWS) is a key enabling technology for hands-free interaction in embedded and IoT devices, where stringent memory and energy constraints challenge the deployment of AI-enabeld devices. In this work, we systematically evaluate and compare several state-of-the-art lightweight neural network architectures, including DS-CNN, LiCoNet, and TENet, alongside our proposed Typman-KWS (TKWS) architecture built upon MobileNet, specifically designed for efficient KWS on microcontroller units (MCUs). Unlike prior studies focused solely on model inference, our analysis encompasses the entire processing pipeline, from Mel-Frequency Cepstral Coefficient (MFCC) feature extraction to neural inference, and is benchmarked across three STM32 platforms (N6, H7, and U5). Our results show that TKWS with three residual blocks achieves up to 92.4% F1-score with only 14.4k parameters, reducing memory footprint without compromising the accuracy. Moreover, the N6 MCU with integrated neural acceleration achieves the best energy-delay product (EDP), enabling efficient, low-latency operation even with high-resolution features. Our findings highlight the model accuracy alone does not determine real-world effectiveness; rather, optimal keyword spotting deployments require careful consideration of feature extraction parameters and hardware-specific optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes