LGFeb 16, 2018

Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

Rasool Fakoor, Xiaodong He, Ivan Tashev, Shuayb Zarar

arXiv:1802.05874v14.710 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of balancing speech quality and recognition accuracy for users of speech enhancement systems, representing an incremental advancement with specific gains.

The paper tackled the challenge of improving both perceptual quality and recognition rate in speech enhancement by proposing a constrained convolutional-recurrent network method, which achieved a 24.5% improvement in PESQ and a 51.3% improvement in WER compared to existing methods.

For a speech-enhancement algorithm, it is highly desirable to simultaneously improve perceptual quality and recognition rate. Thanks to computational costs and model complexities, it is challenging to train a model that effectively optimizes both metrics at the same time. In this paper, we propose a method for speech enhancement that combines local and global contextual structures information through convolutional-recurrent neural networks that improves perceptual quality. At the same time, we introduce a new constraint on the objective function using a language model/decoder that limits the impact on recognition rate. Based on experiments conducted with real user data, we demonstrate that our new context-augmented machine-learning approach for speech enhancement improves PESQ and WER by an additional 24.5% and 51.3%, respectively, when compared to the best-performing methods in the literature.

View on arXiv PDF

Similar