SD AI CL HC ASMar 6, 2021

Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system

Noé Tits, Kevin El Haddad, Thierry Dutoit

arXiv:2103.04097v18.68 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses controllability for TTS systems, but it is incremental as it focuses on assessment rather than introducing new methods.

The paper tackled the problem of evaluating controllability in an expressive text-to-speech system using a dataset with varied styles, finding that both objective correlation measures and subjective user experiments could assess how well acoustic features align with latent expressiveness dimensions.

In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and the dimensions of the latent space representing expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown an interface for Controllable Expressive TTS and asked to retrieve a synthetic utterance whose expressiveness subjectively corresponds to that a reference utterance.

View on arXiv PDF Code

Similar