NE LG MLDec 4, 2014

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

arXiv:1412.1602v1488 citations

Originality Incremental advance

AI Analysis

This addresses speech recognition for applications requiring end-to-end processing, but it is incremental as it matches rather than surpasses existing methods.

The paper tackled continuous speech recognition by replacing traditional HMMs with an attention-based recurrent neural network that directly outputs phonemes, achieving phoneme error rates comparable to state-of-the-art HMM-based decoders on the TIMIT dataset.

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

View on arXiv PDF

Similar