SDASOct 29, 2020

The IQIYI System for Voice Conversion Challenge 2020

arXiv:2010.15317v1
Originality Synthesis-oriented
AI Analysis

This work addresses voice conversion for speech synthesis applications, but it is incremental as it builds on existing PPG-based methods with specific improvements.

The paper presents IQIYI's voice conversion system for the Voice Conversion Challenge 2020, which uses an end-to-end approach based on PPG to convert speech while preserving content and prosody. The system achieved competitive results, ranking 2nd in ASV-based objective similarity evaluation and 5th in subjective evaluation for Task 2.

This paper presents the IQIYI voice conversion system (T24) for Voice Conversion 2020. In the competition, each target speaker has 70 sentences. We have built an end-to-end voice conversion system based on PPG. First, the ASR acoustic model calculates the BN feature, which represents the content-related information in the speech. Then the Mel feature is calculated through an improved prosody tacotron model. Finally, the Mel spectrum is converted to wav through an improved LPCNet. The evaluation results show that this system can achieve better voice conversion effects. In the case of using 16k rather than 24k sampling rate audio, the conversion result is relatively good in naturalness and similarity. Among them, our best results are in the similarity evaluation of the Task 2, the 2nd in the ASV-based objective evaluation and the 5th in the subjective evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes