SDLGASFeb 11, 2021

Speech enhancement with mixture-of-deep-experts with clean clustering pre-training

arXiv:2102.06034v18 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for noisy environments, offering incremental improvements in robustness and efficiency.

The authors tackled single-microphone speech enhancement by proposing a mixture of deep experts (MoDE) architecture, which improved robustness to unfamiliar noise types and reduced test-time complexity.

In this study we present a mixture of deep experts (MoDE) neural-network architecture for single microphone speech enhancement. Our architecture comprises a set of deep neural networks (DNNs), each of which is an 'expert' in a different speech spectral pattern such as phoneme. A gating DNN is responsible for the latent variables which are the weights assigned to each expert's output given a speech segment. The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts' estimates, with the weights determined by the gating DNN. A soft spectral attenuation, based on the estimated mask, is then applied to enhance the noisy speech signal. As a byproduct, we gain reduction at the complexity in test time. We show that the experts specialization allows better robustness to unfamiliar noise types.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes