LG SD ASOct 11, 2021

Unsupervised Source Separation via Bayesian Inference in the Latent Domain

Michele Mancusi, Emilian Postolache, Giorgio Mariani, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

arXiv:2110.05313v44.44 citationsHas Code

Originality Highly original

AI Analysis

This addresses the need for efficient unsupervised source separation in audio processing, offering a practical alternative to resource-intensive methods.

The paper tackles the problem of unsupervised audio source separation by proposing a method that uses deep Bayesian priors and operates in a latent domain, achieving results comparable to state-of-the-art supervised approaches on the Slakh dataset while requiring fewer resources than other unsupervised methods.

State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling resources. On the other hand, approaches for training these models without any direct supervision are typically high-demanding in terms of memory and time requirements, and remain impractical to be used at inference time. We aim to tackle these limitations by proposing a simple yet effective unsupervised separation algorithm, which operates directly on a latent representation of time-domain signals. Our algorithm relies on deep Bayesian priors in the form of pre-trained autoregressive networks to model the probability distributions of each source. We leverage the low cardinality of the discrete latent space, trained with a novel loss term imposing a precise arithmetic structure on it, to perform exact Bayesian inference without relying on an approximation strategy. We validate our approach on the Slakh dataset arXiv:1909.08494, demonstrating results in line with state of the art supervised approaches while requiring fewer resources with respect to other unsupervised methods.

View on arXiv PDF Code

Similar