CLJan 11, 2016

Environmental Noise Embeddings for Robust Speech Recognition

arXiv:1601.02553v230 citations
AI Analysis

This work addresses robust speech recognition for users in noisy and reverberant environments, representing a novel method for a known bottleneck rather than an incremental improvement.

The authors tackled the problem of speech recognition in noisy environments by proposing a deep neural network that predicts the acoustic environment and uses its embedding to improve acoustic modeling, resulting in significant accuracy improvements across multiple datasets including CHiME-3 and Aurora4, outperforming existing methods like multi-condition training and i-vector frameworks.

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes