CLLGSDASOct 4, 2018

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

arXiv:1810.03459v1130 citations
Originality Incremental advance
AI Analysis

This work addresses low-resource speech recognition for multilingual applications, representing an incremental improvement in transfer learning methods.

The paper tackled low-resource automatic speech recognition by building a multilingual sequence-to-sequence model using data from 10 languages and transferring it to 4 other languages, achieving substantial gains over monolingual models and improving word error rate with a recurrent neural network language model to match performance with double the training data.

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multi-lingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes