SDAIASMay 1, 2025

Voice Cloning: Comprehensive Survey

arXiv:2505.00579v19 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

It addresses the need for clarity and organization in voice cloning research for researchers and practitioners, but it is incremental as a compilation of existing work.

This survey establishes standardized terminology for voice cloning and compiles available algorithms, covering speaker adaptation, few-shot/zero-shot/multilingual TTS, evaluation metrics, and datasets to encourage research and limit misuse.

Voice Cloning has rapidly advanced in today's digital world, with many researchers and corporations working to improve these algorithms for various applications. This article aims to establish a standardized terminology for voice cloning and explore its different variations. It will cover speaker adaptation as the fundamental concept and then delve deeper into topics such as few-shot, zero-shot, and multilingual TTS within that context. Finally, we will explore the evaluation metrics commonly used in voice cloning research and related datasets. This survey compiles the available voice cloning algorithms to encourage research toward its generation and detection to limit its misuse.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes