AS LG SD SP MLNov 6, 2018

Bidirectional Quaternion Long-Short Term Memory Recurrent Neural Networks for Speech Recognition

Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato De Mori

arXiv:1811.02566v15.921 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in speech recognition systems by enhancing feature representation, though it appears incremental as it builds on existing LSTM methods.

The paper tackled the problem of weak consideration of internal dependencies in multidimensional features for speech recognition by proposing a quaternion long-short term memory (QLSTM) recurrent neural network, which achieved better performance with up to 2.8 times fewer learning parameters in experiments on a memory copy-task and the Wall Street Journal dataset.

Recurrent neural networks (RNN) are at the core of modern automatic speech recognition (ASR) systems. In particular, long-short term memory (LSTM) recurrent neural networks have achieved state-of-the-art results in many speech recognition tasks, due to their efficient representation of long and short term dependencies in sequences of inter-dependent features. Nonetheless, internal dependencies within the element composing multidimensional features are weakly considered by traditional real-valued representations. We propose a novel quaternion long-short term memory (QLSTM) recurrent neural network that takes into account both the external relations between the features composing a sequence, and these internal latent structural dependencies with the quaternion algebra. QLSTMs are compared to LSTMs during a memory copy-task and a realistic application of speech recognition on the Wall Street Journal (WSJ) dataset. QLSTM reaches better performances during the two experiments with up to $2.8$ times less learning parameters, leading to a more expressive representation of the information.

View on arXiv PDF Code

Similar