ASLGSDJun 13, 2023

Unsupervised speech enhancement with deep dynamical generative speech and noise models

arXiv:2306.07820v14 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work provides an incremental improvement in unsupervised speech enhancement, potentially benefiting applications like hearing aids or communication systems by reducing computational time.

The authors tackled unsupervised speech enhancement by replacing a non-negative matrix factorization noise model with a deep dynamical generative model, achieving competitive performance with state-of-the-art methods and significantly faster inference in noise-dependent configurations.

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes