AS LG SDJun 13, 2023

Unsupervised speech enhancement with deep dynamical generative speech and noise models

Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

arXiv:2306.07820v11.24 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in unsupervised speech enhancement, potentially benefiting applications like hearing aids or communication systems by reducing computational time.

The authors tackled unsupervised speech enhancement by replacing a non-negative matrix factorization noise model with a deep dynamical generative model, achieving competitive performance with state-of-the-art methods and significantly faster inference in noise-dependent configurations.

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process.

View on arXiv PDF

Similar