ASAISDSPDec 28, 2024

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation

arXiv:2412.20048v13 citationsh-index: 9IEEE Transactions on Audio, Speech, and Language Processing
Originality Incremental advance
AI Analysis

This addresses the challenge of generating natural speech in multiple languages while maintaining consistent speaker identity, which is important for applications like multilingual virtual assistants, though it appears to be an incremental advancement on prior cross-lingual synthesis work.

The paper tackles the language-speaker entanglement problem in cross-lingual speech synthesis by proposing CrossSpeech++, which decouples language and speaker generation into separate modules, achieving significant improvements over existing methods.

The goal of this work is to generate natural speech in multiple languages while maintaining the same speaker identity, a task known as cross-lingual speech synthesis. A key challenge of cross-lingual speech synthesis is the language-speaker entanglement problem, which causes the quality of cross-lingual systems to lag behind that of intra-lingual systems. In this paper, we propose CrossSpeech++, which effectively disentangles language and speaker information and significantly improves the quality of cross-lingual speech synthesis. To this end, we break the complex speech generation pipeline into two simple components: language-dependent and speaker-dependent generators. The language-dependent generator produces linguistic variations that are not biased by specific speaker attributes. The speaker-dependent generator models acoustic variations that characterize speaker identity. By handling each type of information in separate modules, our method can effectively disentangle language and speaker representation. We conduct extensive experiments using various metrics, and demonstrate that CrossSpeech++ achieves significant improvements in cross-lingual speech synthesis, outperforming existing methods by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes