LG AI NE SD ASOct 24, 2019

A Recurrent Variational Autoencoder for Speech Enhancement

Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

arXiv:1910.10942v215.191 citations

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for audio processing applications, presenting an incremental improvement over previous feed-forward methods.

The paper tackles speech enhancement by proposing a recurrent variational autoencoder (RVAE) trained on clean speech and combined with a noise model, using a variational EM algorithm to fine-tune the encoder at test time, which improves results by inducing temporal dynamics in latent variables.

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.

View on arXiv PDF

Similar