SD CL ASOct 12, 2021

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

arXiv:2110.05866v117.361 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of unsupervised speech enhancement for applications where clean speech data is unavailable, representing an incremental advance over prior unsupervised methods.

The paper tackles the problem of training speech enhancement models without needing clean speech or noise data, proposing MetricGAN-U which uses only noisy speech and optimizes non-intrusive quality metrics. The results show that MetricGAN-U outperforms baselines in objective and subjective evaluations.

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training. Consequently, several noisy speeches recorded in daily life cannot be used to train the model. Although certain unsupervised learning frameworks have also been proposed to solve the pair constraint, they still require clean speech or noise for training. Therefore, in this paper, we propose MetricGAN-U, which stands for MetricGAN-unsupervised, to further release the constraint from conventional unsupervised learning. In MetricGAN-U, only noisy speech is required to train the model by optimizing non-intrusive speech quality metrics. The experimental results verified that MetricGAN-U outperforms baselines in both objective and subjective metrics.

View on arXiv PDF

Similar