SDAIASSep 5, 2021

A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

arXiv:2109.02011v120 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for noisy audio processing, offering incremental improvements over existing CycleGAN-based methods.

The paper tackled the problem of residual noise propagation and unaltered phase in CycleGAN-based speech enhancement systems by proposing a two-stage denoising approach combining a CycleGAN for magnitude enhancement and a complex spectral refining network for phase estimation and noise suppression. Experimental results on two public datasets showed that the proposed method consistently outperformed previous one-stage CycleGANs and other state-of-the-art systems in various metrics, particularly in background noise suppression.

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further suppress the residual noise components and estimate the clean phase by a complex spectral mapping network, which is a pure complex-valued network composed of complex 2D convolution/deconvolution and complex temporal-frequency attention blocks. Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes