SDASMar 18, 2019

A Vocoder Based Method For Singing Voice Extraction

arXiv:1903.07554v219 citationsHas Code
AI Analysis

This addresses the problem of clean vocal extraction for audio processing applications, but it appears incremental as it builds on existing vocoder and deep learning techniques.

The paper tackles the problem of extracting the singing voice from a musical mixture by using a convolutional network to estimate vocoder parameters from the spectrogram, which are then used to synthesize the vocal track without backing track interference. It evaluates the system through objective metrics and subjective comparisons against NMF and deep learning benchmarks.

This paper presents a novel method for extracting the vocal track from a musical mixture. The musical mixture consists of a singing voice and a backing track which may comprise of various instruments. We use a convolutional network with skip and residual connections as well as dilated convolutions to estimate vocoder parameters, given the spectrogram of an input mixture. The estimated parameters are then used to synthesize the vocal track, without any interference from the backing track. We evaluate our system, through objective metrics pertinent to audio quality and interference from background sources, and via a comparative subjective evaluation. We use open-source source separation systems based on Non-negative Matrix Factorization (NMFs) and Deep Learning methods as benchmarks for our system and discuss future applications for this particular algorithm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes