CLSDASMay 7, 2021

Efficient Weight factorization for Multilingual Speech Recognition

arXiv:2105.03010v123 citations
Originality Highly original
AI Analysis

This work addresses the problem of efficient multilingual speech recognition for AI systems, offering a novel architectural improvement that is incremental but provides strong specific gains.

The paper tackles the challenge of optimizing a single multilingual speech recognition model for multiple languages by proposing an efficient weight factorization method that decomposes weight matrices into shared and language-dependent components, achieving relative word error rate reductions of 26% and 27% for LSTM and Transformer architectures across 7 and 27 languages.

End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In this paper we propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions. The key idea of the method is to assign fast weight matrices for each language by decomposing each weight matrix into a shared component and a language dependent component. The latter is then factorized into vectors using rank-1 assumptions to reduce the number of parameters per language. This efficient factorization scheme is proved to be effective in two multilingual settings with $7$ and $27$ languages, reducing the word error rates by $26\%$ and $27\%$ rel. for two popular architectures LSTM and Transformer, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes