SD MM ASOct 13, 2021

Singer separation for karaoke content generation

Hsuan-Yu Lin, Xuanjun Chen, Jyh-Shing Roger Jang

arXiv:2110.06707v42.3h-index: 35

Originality Incremental advance

AI Analysis

This work addresses a specific need in karaoke applications by enabling the extraction of individual lead vocals, though it is incremental as it builds on existing voice separation techniques.

The paper tackles the problem of separating lead singers from music for karaoke content generation, proposing a singer separation system that handles male-female duets or vocal harmonies, and introduces three models with an automatic selection scheme and a publicly released dataset.

Due to the rapid development of deep learning, we can now successfully separate singing voice from mono audio music. However, this separation can only extract human voices from other musical instruments, which is undesirable for karaoke content generation applications that only require the separation of lead singers. For this karaoke application, we need to separate the music containing male and female duets into two vocals, or extract a single lead vocal from the music containing vocal harmony. For this reason, we propose in this article to use a singer separation system, which generates karaoke content for one or two separated lead singers. In particular, we introduced three models for the singer separation task and designed an automatic model selection scheme to distinguish how many lead singers are in the song. We also collected a large enough data set, MIR-SingerSeparation, which has been publicly released to advance the frontier of this research. Our singer separation is most suitable for sentimental ballads and can be directly applied to karaoke content generation. As far as we know, this is the first singer-separation work for real-world karaoke applications.

View on arXiv PDF

Similar