AS SDJul 29, 2020

On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li

arXiv:2007.14974v312.243 citations

Originality Incremental advance

AI Analysis

This work addresses the need for better comparisons and loss function evaluation in GAN-based speech enhancement, offering incremental improvements for the audio processing domain.

The authors tackled the problem of comparing GAN-based speech enhancement systems to non-GAN state-of-the-art methods and evaluating different loss functions, resulting in a proposed convolutional recurrent GAN (CRGAN) that outperforms both GAN-based and non-GAN systems, with a combination of objective metric and MSE loss providing the best performance across multiple metrics.

Recent work has shown that it is feasible to use generative adversarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches. Additionally, many loss functions have been proposed for GAN-based approaches, but they have not been adequately compared. In this study, we propose novel convolutional recurrent GAN (CRGAN) architectures for speech enhancement. Multiple loss functions are adopted to enable direct comparisons to other GAN-based systems. The benefits of including recurrent layers are also explored. Our results show that the proposed CRGAN model outperforms the SOTA GAN-based models using the same loss functions and it outperforms other non-GAN based systems, indicating the benefits of using a GAN for speech enhancement. Overall, the CRGAN model that combines an objective metric loss function with the mean squared error (MSE) provides the best performance over comparison approaches across many evaluation metrics.

View on arXiv PDF

Similar