SDASMLNov 16, 2018

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

arXiv:1811.06713v363 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for applications in noisy settings, but it is incremental as it builds on an existing multichannel local Gaussian framework.

The paper tackles speaker-independent multichannel speech enhancement in unknown noisy environments by combining variational autoencoders for supervised speech modeling with non-negative matrix factorization for unsupervised noise modeling, resulting in performance that outperforms an NMF-based counterpart.

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for modeling the speech spectro-temporal content. The parameters of this supervised model are learned using the framework of variational autoencoders. The noisy recording environment is supposed to be unknown, so the noise spectro-temporal modeling remains unsupervised and is based on non-negative matrix factorization (NMF). We develop a Monte Carlo expectation-maximization algorithm and we experimentally show that the proposed approach outperforms its NMF-based counterpart, where speech is modeled using supervised NMF.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes