CLJan 11, 2016

Environmental Noise Embeddings for Robust Speech Recognition

arXiv:1601.02553v24.830 citations

Originality Highly original

AI Analysis

This work addresses robust speech recognition for users in noisy and reverberant environments, representing a novel method for a known bottleneck rather than an incremental improvement.

The authors tackled the problem of speech recognition in noisy environments by proposing a deep neural network that predicts the acoustic environment and uses its embedding to improve acoustic modeling, resulting in significant accuracy improvements across multiple datasets including CHiME-3 and Aurora4, outperforming existing methods like multi-condition training and i-vector frameworks.

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise.

View on arXiv PDF

Similar