Singing voice separation: a study on training data
This work addresses the problem of improving singing voice separation for audio processing applications, but it is incremental as it focuses on dataset design rather than introducing new methods.
The study investigated how training dataset characteristics impact the performance of state-of-the-art singing voice separation algorithms, finding that separation quality and diversity are key complementary factors for effective datasets.
In the recent years, singing voice separation systems showed increased performance due to the use of supervised training. The design of training datasets is known as a crucial factor in the performance of such systems. We investigate on how the characteristics of the training dataset impacts the separation performances of state-of-the-art singing voice separation algorithms. We show that the separation quality and diversity are two important and complementary assets of a good training dataset. We also provide insights on possible transforms to perform data augmentation for this task.