LGAINESDASOct 24, 2019

A Recurrent Variational Autoencoder for Speech Enhancement

arXiv:1910.10942v291 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for audio processing applications, presenting an incremental improvement over previous feed-forward methods.

The paper tackles speech enhancement by proposing a recurrent variational autoencoder (RVAE) trained on clean speech and combined with a noise model, using a variational EM algorithm to fine-tune the encoder at test time, which improves results by inducing temporal dynamics in latent variables.

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes