CLSDASApr 25, 2019

The Zero Resource Speech Challenge 2019: TTS without T

arXiv:1904.11469v2126 citations
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of text-to-speech synthesis in low-resource or unknown language settings, but it is incremental as it builds on existing unsupervised and TTS methods within a competition framework.

The paper tackled the problem of building a speech synthesizer without text or phonetic labels by proposing the Zero Resource Speech Challenge 2019, where participants used unsupervised methods to discover subword units and align them for synthesis, resulting in 19 submitted systems from 10 teams evaluated against baseline and topline systems.

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker's voice. We describe the metrics used for evaluation, a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 10 teams and discuss the main results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes