SDLGMMASJun 20, 2019

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

arXiv:1906.08415v12 citations
Originality Incremental advance
AI Analysis

This addresses robustness for small-footprint keyword spotting devices, but appears incremental as it builds on existing enhancement and CNN methods.

The paper tackles the problem of noise robustness in keyword spotting (KWS) for real-world environments by proposing a jointly trained speech enhancement front-end and KWS system, which significantly improves noise robustness.

Robustness against noise is critical for keyword spotting (KWS) in real-world environments. To improve the robustness, a speech enhancement front-end is involved. Instead of treating the speech enhancement as a separated preprocessing before the KWS system, in this study, a pre-trained speech enhancement front-end and a convolutional neural networks (CNNs) based KWS system are concatenated, where a feature transformation block is used to transform the output from the enhancement front-end into the KWS system's input. The whole model is trained jointly, thus the linguistic and other useful information from the KWS system can be back-propagated to the enhancement front-end to improve its performance. To fit the small-footprint device, a novel convolution recurrent network is proposed, which needs fewer parameters and computation and does not degrade performance. Furthermore, by changing the input features from the power spectrogram to Mel-spectrogram, less computation and better performance are obtained. our experimental results demonstrate that the proposed method significantly improves the KWS system with respect to noise robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes