On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems
This enables on-device adaptation for keyword spotting on low-power edge devices, addressing noise robustness in real-world applications, though it is incremental in improving existing methods.
The paper tackles the problem of keyword spotting accuracy degradation in noisy environments by proposing a fully on-device domain adaptation system, achieving up to 14% accuracy gains over robust models and recovering 5% accuracy with only 100 labeled utterances.
Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over already-robust keyword spotting models. We enable on-device learning with less than 10 kB of memory, using only 100 labeled utterances to recover 5% accuracy after adapting to the complex speech noise. We demonstrate that domain adaptation can be achieved on ultra-low-power microcontrollers with as little as 806 mJ in only 14 s on always-on, battery-operated devices.