Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning
This addresses audio source separation for applications like music production or audio analysis, but it is incremental as it builds on existing weakly supervised methods with specific improvements.
The paper tackles the problem of separating audio mixtures into individual instrument tracks by introducing a weakly supervised approach using deep adversarial learning with a Wasserstein distance loss and a spectrum energy preservation regularization. The method performs competitively against state-of-the-art weakly supervised methods on public benchmarks.
Separating audio mixtures into individual instrument tracks has been a long standing challenging task. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning. Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source. Moreover, a global regularization term is added to fulfill the spectrum energy preservation property regardless separation. Unlike state-of-the-art weakly supervised models which often involve deliberately devised constraints or careful model selection, our approach need little prior model specification on the data, and can be straightforwardly learned in an end-to-end fashion. We show that the proposed method performs competitively on public benchmark against state-of-the-art weakly supervised methods.