Semi-blind source separation with multichannel variational autoencoder
This is an incremental improvement for audio signal processing, addressing source separation with a novel hybrid approach.
The paper tackled the problem of semi-blind source separation by proposing a multichannel variational autoencoder (MVAE) method that models source spectrograms using a conditional VAE, resulting in better separation performance than a baseline method in experiments.
This paper proposes a multichannel source separation technique called the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture. By training the CVAE using the spectrograms of training examples with source-class labels, we can use the trained decoder distribution as a universal generative model capable of generating spectrograms conditioned on a specified class label. By treating the latent space variables and the class label as the unknown parameters of this generative model, we can develop a convergence-guaranteed semi-blind source separation algorithm that consists of iteratively estimating the power spectrograms of the underlying sources as well as the separation matrices. In experimental evaluations, our MVAE produced better separation performance than a baseline method.