SD CL ASApr 6, 2022

Simple and Effective Unsupervised Speech Synthesis

Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

MIT

arXiv:2204.02524v315.119 citationsh-index: 84

Originality Incremental advance

AI Analysis

This enables speech synthesis for applications where labeled data is scarce or unavailable, though it is incremental as it builds on existing unsupervised recognition and neural synthesis techniques.

The paper tackles the problem of speech synthesis without human-labeled data by developing the first unsupervised system using only unlabeled speech audio, unlabeled text, and a lexicon, achieving results comparable to supervised methods in naturalness and intelligibility as measured by human evaluation.

We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe. The framework leverages recent work in unsupervised speech recognition as well as existing neural-based speech synthesis. Using only unlabeled speech audio and unlabeled text as well as a lexicon, our method enables speech synthesis without the need for a human-labeled corpus. Experiments demonstrate the unsupervised system can synthesize speech similar to a supervised counterpart in terms of naturalness and intelligibility measured by human evaluation.

View on arXiv PDF

Similar