Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge
This work addresses noise robustness for speech recognition systems, particularly in challenge settings like CHiME, but it is incremental as it builds on existing methods.
The paper tackles speech separation and recognition in noisy environments by combining a front-end Multi-channel Wiener filter with optimized parameters and a back-end using DNN, CNN, LSTM, lattice rescoring, and ROVER, resulting in improved ASR performance as shown in experiments.
This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition. In the front-end, Multi-channel Wiener filter (MWF) is designed to achieve background noise reduction. Different from traditional MWF, optimized parameter for the tradeoff between noise reduction and target signal distortion is built according to the desired noise reduction level. In the back-end, several techniques are taken advantage to improve the noisy Automatic Speech Recognition (ASR) performance including Deep Neural Network (DNN), Convolutional Neural Network (CNN) and Long short-term memory (LSTM) using medium vocabulary, Lattice rescoring with a big vocabulary language model finite state transducer, and ROVER scheme. Experimental results show the proposed system combining front-end and back-end is effective to improve the ASR performance.