SD CV LG ASNov 2, 2022

A weighted-variance variational autoencoder model for speech enhancement

Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel

arXiv:2211.00990v22.22 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for audio processing applications, presenting an incremental improvement over existing variational autoencoder methods.

The paper tackles speech enhancement by proposing a weighted-variance variational autoencoder model that uses a Gamma prior on weights to achieve a Student's t-distribution for speech generative modeling, showing effectiveness and robustness in experiments compared to standard unweighted models.

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

View on arXiv PDF

Similar