SD LG MM ASJun 20, 2019

A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting

Yue Gu, Zhihao Du, Hui Zhang, Xueliang Zhang

arXiv:1906.08415v13.72 citations

Originality Incremental advance

AI Analysis

This addresses robustness for small-footprint keyword spotting devices, but appears incremental as it builds on existing enhancement and CNN methods.

The paper tackles the problem of noise robustness in keyword spotting (KWS) for real-world environments by proposing a jointly trained speech enhancement front-end and KWS system, which significantly improves noise robustness.

Robustness against noise is critical for keyword spotting (KWS) in real-world environments. To improve the robustness, a speech enhancement front-end is involved. Instead of treating the speech enhancement as a separated preprocessing before the KWS system, in this study, a pre-trained speech enhancement front-end and a convolutional neural networks (CNNs) based KWS system are concatenated, where a feature transformation block is used to transform the output from the enhancement front-end into the KWS system's input. The whole model is trained jointly, thus the linguistic and other useful information from the KWS system can be back-propagated to the enhancement front-end to improve its performance. To fit the small-footprint device, a novel convolution recurrent network is proposed, which needs fewer parameters and computation and does not degrade performance. Furthermore, by changing the input features from the power spectrogram to Mel-spectrogram, less computation and better performance are obtained. our experimental results demonstrate that the proposed method significantly improves the KWS system with respect to noise robustness.

View on arXiv PDF

Similar