LGCLNEMLApr 7, 2015

Deep Recurrent Neural Networks for Acoustic Modelling

arXiv:1504.01482v144 citations
AI Analysis

This work addresses speech recognition accuracy for ASR systems, representing an incremental architectural improvement.

The authors tackled acoustic modeling for automatic speech recognition by proposing a novel deep recurrent neural network architecture combining DNN, time convolution, and bidirectional LSTM components, achieving a 3.47% word error rate on the WSJ eval92 task with over 8% relative improvement over baseline DNN models.

We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then generates a context from the sequence acoustic signal, and the final DNN takes the context and models the posterior probabilities of the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) eval92 task or more than 8% relative improvement over the baseline DNN models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes