SD ASFeb 21, 2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

arXiv:2202.10020v120.666 citations

Originality Incremental advance

AI Analysis

This work addresses voice conversion for speech synthesis applications, but it appears incremental as it builds on existing VQVC and AutoVC methods.

The paper tackled the problem of one-shot voice conversion by proposing AVQVC, a framework based on vector quantization and AutoVC with a new training method to better separate content and timbre information from speech, resulting in improved sound quality compared to VQVC.

Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timbre and the linguistic content information from a speech signal. Once successful, voice conversion will be feasible and straightforward. This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. A new training method is applied to VQVC to separate content and timbre information from speech more effectively. The result shows that this approach has better performance than VQVC in separating content and timbre to improve the sound quality of generated speech.

View on arXiv PDF

Similar