SDCLASJan 14, 2019

Exploring Transfer Learning for Low Resource Emotional TTS

arXiv:1901.04276v170 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of data scarcity in emotional text-to-speech synthesis for low-resource applications, presenting an incremental improvement over existing methods.

The paper tackles the challenge of synthesizing emotional speech with limited data by fine-tuning a pre-trained text-to-speech model on small datasets for speaker adaptation and emotional style, achieving competitive performance with minimal data.

During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes