AISep 21, 2023

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

arXiv:2309.11724v14 citationsh-index: 87Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more natural and emotionally expressive speech synthesis, but it is incremental as it builds on existing prosodic phrasing methods by adding emotion awareness.

The paper tackles the problem of prosodic phrasing for expressive emotion rendering in text-to-speech, which has not been well studied, and shows that their emotion-aware model outperforms baselines with remarkable performance in emotion expressiveness.

Prosodic phrasing is crucial to the naturalness and intelligibility of end-to-end Text-to-Speech (TTS). There exist both linguistic and emotional prosody in natural speech. As the study of prosodic phrasing has been linguistically motivated, prosodic phrasing for expressive emotion rendering has not been well studied. In this paper, we propose an emotion-aware prosodic phrasing model, termed \textit{EmoPP}, to mine the emotional cues of utterance accurately and predict appropriate phrase breaks. We first conduct objective observations on the ESD dataset to validate the strong correlation between emotion and prosodic phrasing. Then the objective and subjective evaluations show that the EmoPP outperforms all baselines and achieves remarkable performance in terms of emotion expressiveness. The audio samples and the code are available at \url{https://github.com/AI-S2-Lab/EmoPP}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes