SDLGASFeb 2, 2020

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

arXiv:2002.00319v19 citations
AI Analysis

This addresses speech enhancement for noisy audio processing, but appears incremental as it builds on existing convolutional recurrent networks with architectural refinements.

The authors tackled single-channel speech enhancement by proposing an end-to-end temporal convolutional recurrent network (TCRN) that directly maps noisy to clean waveforms, showing it consistently outperforms existing methods in speech intelligibility and quality.

In recent decades, neural network based methods have significantly improved the performace of speech enhancement. Most of them estimate time-frequency (T-F) representation of target speech directly or indirectly, then resynthesize waveform using the estimated T-F representation. In this work, we proposed the temporal convolutional recurrent network (TCRN), an end-to-end model that directly map noisy waveform to clean waveform. The TCRN, which is combined convolution and recurrent neural network, is able to efficiently and effectively leverage short-term ang long-term information. Futuremore, we present the architecture that repeatedly downsample and upsample speech during forward propagation. We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks. Futuremore, We present several key techniques to stabilize the training process. The experimental results show that our model consistently outperforms existing speech enhancement approaches, in terms of speech intelligibility and quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes