ASCLLGAug 3, 2020

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

arXiv:2008.00768v161 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient multilingual speech synthesis for applications requiring voice cloning and code-switching, though it is incremental as it builds on existing Tacotron 2 architecture.

The paper tackles multilingual text-to-speech synthesis by introducing a meta-learning approach that uses contextual parameter generation, enabling natural-sounding speech across more languages with less training data. Results show it produces more natural and accurate code-switching speech than baselines in subjective evaluations.

We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation and produces natural-sounding multilingual speech using more languages and less training data than previous approaches. Our model is based on Tacotron 2 with a fully convolutional input text encoder whose weights are predicted by a separate parameter generator network. To boost voice cloning, the model uses an adversarial speaker classifier with a gradient reversal layer that removes speaker-specific information from the encoder. We arranged two experiments to compare our model with baselines using various levels of cross-lingual parameter sharing, in order to evaluate: (1) stability and performance when training on low amounts of data, (2) pronunciation accuracy and voice quality of code-switching synthesis. For training, we used the CSS10 dataset and our new small dataset based on Common Voice recordings in five languages. Our model is shown to effectively share information across languages and according to a subjective evaluation test, it produces more natural and accurate code-switching speech than the baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes