ASLGSDSPApr 15, 2019

RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech Enhancement

arXiv:1904.07294v110 citations
Originality Incremental advance
AI Analysis

This addresses speech enhancement for audio processing applications, offering an incremental improvement over existing waveform-based approaches.

The paper tackles speech enhancement by proposing a fully-recurrent hourglass-shaped neural network with residual connections that processes waveforms directly, avoiding spectrogram limitations. It outperforms state-of-the-art methods across six evaluation metrics.

Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn long-range temporal correlations across high-resolution waveforms. These models, however, are limited by memory-intensive dilated convolution and aliasing artifacts from upsampling. We introduce an end-to-end fully-recurrent hourglass-shaped neural network architecture with residual connections for waveform-based single-channel speech enhancement. Our model can efficiently capture long-range temporal dependencies by reducing the features resolution without information loss. Experimental results show that our model outperforms state-of-the-art approaches in six evaluation metrics.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes