LGMay 10

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

Wiktor Matykiewicz, Piotr Wzorek, Kamil Jeziorek, Tomás Muñoz, Antonio Rios-Navarro, Angel Jiménez-Fernández, Tomasz Kryjak

arXiv:2605.0957030.2Has Code

AI Analysis

Enables efficient, real-time keyword spotting on edge devices for mobile robotics and embedded intelligence.

This work presents the first end-to-end FPGA implementation of a keyword spotting system integrating a Neuromorphic Auditory Sensor and a graph neural network, achieving 87.43% accuracy on Google Speech Commands v2 with <35 μs latency and 1.12 W power consumption.

With the rapid growth of mobile robotics and embedded intelligence, there is an increasing demand for efficient on-device data processing on edge platforms. A promising research direction is the use of neuromorphic sensors inspired by human sensory systems, which generate sparse, event-based data encoding changes in the environment. In this work, we present the first end-to-end FPGA implementation of a keyword spotting system that integrates a Neuromorphic Auditory Sensor (NAS) and a graph neural network (GNN) on a single FPGA device, enabling real-time processing of raw audio data. The proposed architecture eliminates conventional signal preprocessing and operates directly on event-based audio streams. Leveraging a compute-near-memory network architecture, the system achieves efficient inference with low latency and low power consumption. Experimental results demonstrate an accuracy of 87.43% after quantization on the Google Speech Commands v2 dataset processed through the neuromorphic sensor, with end-to-end latency below 35 us and average power consumption of 1.12 W. The processed datasets, software models, and hardware modules are available at https://github.com/vision-agh/NAS-GNN-KWS.

View on arXiv PDF Code

Similar