SD AI ASApr 8, 2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

arXiv:2104.03538v233.6300 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the gap between training objectives and perceptual quality in speech enhancement, offering incremental improvements for applications like hearing aids or communication systems.

The paper tackles the problem of improving speech enhancement by aligning model training with human auditory perception, proposing MetricGAN+ which increases PESQ score by 0.3 over MetricGAN and achieves a state-of-the-art PESQ score of 3.15 on the VoiceBank-DEMAND dataset.

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).

View on arXiv PDF Code

Similar