A Style Transfer Approach to Source Separation
This approach addresses the challenge of supervised source separation by eliminating the need for paired mixture-clean data, which could benefit audio processing applications where such data is scarce.
The paper tackles the problem of source separation by interpreting it as a style transfer task, using cycle-consistent variational auto-encoders to learn a mapping from mixtures to clean sounds without paired training examples, achieving separation without explicit supervision.
Training neural networks for source separation involves presenting a mixture recording at the input of the network and updating network parameters in order to produce an output that resembles the clean source. Consequently, supervised source separation depends on the availability of paired mixture-clean training examples. In this paper, we interpret source separation as a style transfer problem. We present a variational auto-encoder network that exploits the commonality across the domain of mixtures and the domain of clean sounds and learns a shared latent representation across the two domains. Using these cycle-consistent variational auto-encoders, we learn a mapping from the mixture domain to the domain of clean sounds and perform source separation without explicitly supervising with paired training examples.