ASLGSDSep 25, 2023

BiSinger: Bilingual Singing Voice Synthesis

arXiv:2309.14089v37 citationsh-index: 80Has Code
Originality Incremental advance
AI Analysis

This addresses the need for code-switch SVS in pop music, which is hindered by current separate-language models, though it is incremental as it builds on existing TTS and SVS techniques.

The paper tackles the problem of multilingual singing voice synthesis (SVS) by developing BiSinger, a bilingual system for English and Chinese Mandarin that uses a shared representation to enable a single model, improving performance in English and code-switch SVS while maintaining Chinese song quality.

Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes