ASLGMMSDMLDec 2, 2019

Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice Separation

arXiv:1912.02591v36 citations
Originality Incremental advance
AI Analysis

This work addresses singing voice separation for audio processing applications, but it is incremental as it focuses on architectural variations within existing U-Net frameworks.

The paper tackled the problem of singing voice separation by evaluating various intermediate blocks in U-Net architectures, achieving a state-of-the-art SDR improvement of 0.9 dB on the MUSDB dataset.

Singing Voice Separation (SVS) tries to separate singing voice from a given mixed musical signal. Recently, many U-Net-based models have been proposed for the SVS task, but there were no existing works that evaluate and compare various types of intermediate blocks that can be used in the U-Net architecture. In this paper, we introduce a variety of intermediate spectrogram transformation blocks. We implement U-nets based on these blocks and train them on complex-valued spectrograms to consider both magnitude and phase. These networks are then compared on the SDR metric. When using a particular block composed of convolutional and fully-connected layers, it achieves state-of-the-art SDR on the MUSDB singing voice separation task by a large margin of 0.9 dB. Our code and models are available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes