ASCLSDOct 29, 2019

a novel cross-lingual voice cloning approach with a few text-free samples

arXiv:1910.13276v21 citations
Originality Incremental advance
AI Analysis

This addresses voice cloning for speakers across different languages, enabling speech generation in a target language with minimal audio data, though it appears incremental as it builds on existing methods.

The paper tackles cross-lingual voice cloning by using bottleneck features from a speaker-independent ASR model and a latent prosody model to bridge language and speaker gaps, achieving better naturalness and similarity than baselines with few text-free samples.

In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target speaker to realize voice cloning. This system can generate speech of arbitrary utterance of target language in cross-lingual speakers' voice. We verify that with small amount of audio data, our proposed approach can well handle cross-lingual tasks. And in intra-lingual tasks, our proposed approach also performs better than baseline approach in naturalness and similarity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes