SDCLLGASOct 12, 2021

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

arXiv:2110.06280v145 citationsHas Code
Originality Incremental advance
AI Analysis

This work provides an incremental improvement for the voice conversion and self-supervised speech representation communities by offering a toolkit and benchmarking analysis.

The paper tackles the problem of expensive supervised representations in voice conversion by introducing S3PRL-VC, an open-source framework using self-supervised speech representations, showing it achieves comparable similarity to top systems in any-to-one settings and state-of-the-art in any-to-any voice conversion.

This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech representation (S3R) is valuable in its potential to replace the expensive supervised representation adopted by state-of-the-art VC systems. Moreover, we claim that VC is a good probing task for S3R analysis. In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting. We also provide comparisons between not only different S3Rs but also top systems in VCC2020 with supervised representations. Systematic objective and subjective evaluation were conducted, and we show that S3R is comparable with VCC2020 top systems in the A2O setting in terms of similarity, and achieves state-of-the-art in S3R-based A2A VC. We believe the extensive analysis, as well as the toolkit itself, contribute to not only the S3R community but also the VC community. The codebase is now open-sourced.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes