ASSDOct 6, 2020

The Academia Sinica Systems of Voice Conversion for VCC2020

arXiv:2010.02669v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses voice conversion for speech processing applications, but it is incremental as it builds on existing methods for a specific challenge.

The paper tackled voice conversion tasks for VCC2020, using a cascaded ASR+TTS structure with phonetic tokens, and reported that their systems performed well in the challenge evaluation.

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2). For both tasks, we followed the cascaded ASR+TTS structure, using phonetic tokens as the TTS input instead of the text or characters. For Task 1, we used the international phonetic alphabet (IPA) as the input of the TTS model. For Task 2, we used unsupervised phonetic symbols extracted by the vector-quantized variational autoencoder (VQVAE). In the evaluation, the listening test showed that our systems performed well in the VCC2020 challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes