SDLGASOct 31, 2018

Audio Source Separation Using Variational Autoencoders and Weak Class Supervision

arXiv:1810.13104v328 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reducing annotation effort for audio source separation, though it is incremental as it builds on existing VAE and non-negative models.

The paper tackles the problem of audio source separation by training a model on mixtures with only class labels of sources, not isolated signals, using variational autoencoders (VAEs) as priors for each source class. The result shows that this weak class supervision achieves separation performance comparable to having full source signal supervision, as demonstrated on mixtures of digit utterances.

In this paper, we propose a source separation method that is trained by observing the mixtures and the class labels of the sources present in the mixture without any access to isolated sources. Since our method does not require source class labels for every time-frequency bin but only a single label for each source constituting the mixture signal, we call this scenario as weak class supervision. We associate a variational autoencoder (VAE) with each source class within a non-negative (compositional) model. Each VAE provides a prior model to identify the signal from its associated class in a sound mixture. After training the model on mixtures, we obtain a generative model for each source class and demonstrate our method on one-second mixtures of utterances of digits from 0 to 9. We show that the separation performance obtained by source class supervision is as good as the performance obtained by source signal supervision.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes