SD MM ASOct 22, 2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training

arXiv:1810.09067v27.15 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of enhancing ASR robustness in noisy environments for speech technology applications, but it is incremental as it applies existing separation methods to a new task.

The paper tackled the problem of using monaural speech separation as front-end processing to improve automatic speech recognition (ASR) without retraining or joint-training, finding that it achieved relative WER reductions of 36.40% for GMM-based and 11.78% for DNN-based ASR on the CHiME-3 challenge.

In recent years, monaural speech separation has been formulated as a supervised learning problem, which has been systematically researched and shown the dramatical improvement of speech intelligibility and quality for human listeners. However, it has not been well investigated whether the methods can be employed as the front-end processing and directly improve the performance of a machine listener, i.e., an automatic speech recognizer, without retraining or joint-training the acoustic model. In this paper, we explore the effectiveness of the independent front-end processing for the multi-conditional trained ASR on the CHiME-3 challenge. We find that directly feeding the enhanced features to ASR can make 36.40% and 11.78% relative WER reduction for the GMM-based and DNN-based ASR respectively. We also investigate the affect of noisy phase and generalization ability under unmatched noise condition.

View on arXiv PDF

Similar