AS LG MM SD MLDec 2, 2019

Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

Woosung Choi, Minseok Kim, Jaehwa Chung, Daewon Lee, Soonyoung Jung

arXiv:1912.02591v312.26 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses singing voice separation for audio processing applications, but it is incremental as it focuses on architectural variations within existing U-Net frameworks.

The paper tackled the problem of singing voice separation by evaluating various intermediate blocks in U-Net architectures, achieving a state-of-the-art SDR improvement of 0.9 dB on the MUSDB dataset.

Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.

View on arXiv PDF Code

Similar