SDASFeb 1, 2018

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

arXiv:1802.00300v126 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of separating singing voices from single-channel music mixtures for audio processing applications, representing an incremental improvement over existing methods.

The paper tackles monaural singing voice separation by proposing a novel deep learning method that learns long-term temporal patterns, resulting in an increment of 0.37 dB in SDR and 0.23 dB in SIR compared to previous SOTA.

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel deep learning based method that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes