SDASDec 21, 2021

Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training

arXiv:2112.11459v1
AI Analysis

This work addresses speech enhancement for audio processing applications, representing an incremental improvement over existing methods.

The paper tackles the challenge of reducing the performance gap between estimated and target speech signals in self-supervised learning for monaural speech enhancement by proposing a multi-task pre-training method, which outperforms state-of-the-art approaches on a benchmark dataset.

In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are required to learn their latent representations. Meanwhile, to solve the limitation of single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the second pre-task. Different from the PAE, where the target speech signals are estimated, the downstream task autoencoder (DAE) utilizes a large number of unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The trained DAE is shared by the learned representations and masks. Experimental results on a benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes