SDLGASNov 1, 2023

Low-latency Real-time Voice Conversion on CPU

arXiv:2311.00873v14 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses low-latency voice conversion for real-time applications, but it is incremental as it builds on prior audio manipulation methods.

The paper tackles real-time any-to-one voice conversion by adapting existing neural network architectures, achieving a latency under 20ms at 16kHz and running nearly 2.8x faster than real-time on a consumer CPU.

We adapt the architectures of previous audio manipulation and generation neural networks to the task of real-time any-to-one voice conversion. Our resulting model, LLVC ($\textbf{L}$ow-latency $\textbf{L}$ow-resource $\textbf{V}$oice $\textbf{C}$onversion), has a latency of under 20ms at a bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU. LLVC uses both a generative adversarial architecture as well as knowledge distillation in order to attain this performance. To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model. We provide open-source samples, code, and pretrained model weights at https://github.com/KoeAI/LLVC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes