End-to-end attention-based distant speech recognition with Highway LSTM
This work addresses speech recognition in noisy, distant settings, but it is incremental as it builds on existing attention-based models.
The authors tackled distant speech recognition by extending end-to-end attention-based models with multichannel input and Highway LSTM, achieving improved performance on the AMI benchmark.
End-to-end attention-based models have been shown to be competitive alternatives to conventional DNN-HMM models in the Speech Recognition Systems. In this paper, we extend existing end-to-end attention-based models that can be applied for Distant Speech Recognition (DSR) task. Specifically, we propose an end-to-end attention-based speech recognizer with multichannel input that performs sequence prediction directly at the character level. To gain a better performance, we also incorporate Highway long short-term memory (HLSTM) which outperforms previous models on AMI distant speech recognition task.