SD ASFeb 1, 2018

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

arXiv:1802.00300v112.026 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of separating singing voices from single-channel music mixtures for audio processing applications, representing an incremental improvement over existing methods.

The paper tackles monaural singing voice separation by proposing a novel deep learning method that learns long-term temporal patterns, resulting in an increment of 0.37 dB in SDR and 0.23 dB in SIR compared to previous SOTA.

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel deep learning based method that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

View on arXiv PDF Code

Similar