ASLGSDNov 7, 2022

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

arXiv:2211.03316v34.38 citationsh-index: 25Has Code
Originality Incremental advance
AI Analysis

This addresses the need for personalized and understandable speech synthesis in applications like assistive technology, though it is incremental as it builds on existing TTS methods.

The paper tackles the problem of synthesizing speech with controllable accents using a Conditional Variational Autoencoder, achieving effective accent manipulation as validated by objective and subjective evaluations.

Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, and convert this to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the model's ability to manipulate accents in the synthesized speech. Overall, our proposed framework presents a promising avenue for future accented TTS research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes