SDLGASSPNov 2, 2022

Fast and efficient speech enhancement with variational autoencoders

arXiv:2211.02728v16 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for audio processing applications, but it is incremental as it builds upon existing variational autoencoder frameworks.

The paper tackles the problem of computationally heavy and inefficient speech enhancement methods by proposing a new approach based on Langevin dynamics with total variation regularization, resulting in improved computational efficiency and enhancement quality that outperforms existing methods.

Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods. This approach involves the use of a pre-trained deep speech prior along with a parametric noise model, where the noise parameters are learned from the noisy speech signal with an expectationmaximization (EM)-based method. The E-step involves an intractable latent posterior distribution. Existing algorithms to solve this step are either based on computationally heavy Monte Carlo Markov Chain sampling methods and variational inference, or inefficient optimization-based methods. In this paper, we propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors. Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes