SDCLASOct 12, 2021

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

arXiv:2110.05866v161 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of unsupervised speech enhancement for applications where clean speech data is unavailable, representing an incremental advance over prior unsupervised methods.

The paper tackles the problem of training speech enhancement models without needing clean speech or noise data, proposing MetricGAN-U which uses only noisy speech and optimizes non-intrusive quality metrics. The results show that MetricGAN-U outperforms baselines in objective and subjective evaluations.

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training. Consequently, several noisy speeches recorded in daily life cannot be used to train the model. Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean speech or noise for training. Therefore, in this paper, we propose MetricGAN-U, which stands for MetricGAN-unsupervised, to further release the constraint from conventional unsupervised learning. In MetricGAN-U, only noisy speech is required to train the model by optimizing non-intrusive speech quality metrics. The experimental results verified that MetricGAN-U outperforms baselines in both objective and subjective metrics.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes