NE AI CL LG ASOct 30, 2015

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

arXiv:1510.08983v233.5297 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving speech recognition accuracy in noisy, distant settings, representing an incremental advancement in LSTM architecture.

The paper tackled the problem of training deeper long short-term memory (LSTM) networks for distant speech recognition by introducing highway connections to alleviate gradient vanishing, achieving a word error rate of 43.9/47.7% on AMI datasets with 15.7% and 5.3% relative improvements over baselines.

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information flow across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains $43.9/47.7\%$ WER on AMI (SDM) dev and eval sets, outperforming all previous works. It beats the strong DNN and DLSTM baselines with $15.7\%$ and $5.3\%$ relative improvement respectively.

View on arXiv PDF

Similar