SDCLASNov 15, 2017

Emotional End-to-End Neural Speech Synthesizer

arXiv:1711.05447v2123 citations
Originality Synthesis-oriented
AI Analysis

This work addresses emotional speech synthesis for applications like human-computer interaction, but it is incremental as it builds on existing Tacotron methods.

The authors tackled the problem of emotional speech synthesis by modifying Tacotron to address exposure bias and attention alignment irregularities, resulting in a model that successfully generates speech for given emotion labels.

In this paper, we introduce an emotional speech synthesizer based on the recent end-to-end neural model, named Tacotron. Despite its benefits, we found that the original Tacotron suffers from the exposure bias problem and irregularity of the attention alignment. Later, we address the problem by utilization of context vector and residual connection at recurrent neural networks (RNNs). Our experiments showed that the model could successfully train and generate speech for given emotion labels.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes