ASSDOct 26, 2019

Image to Image Translation based on Convolutional Neural Network Approach for Speech Declipping

arXiv:1910.12116v16 citations
Originality Incremental advance
AI Analysis

This work addresses speech quality degradation from clipping for audio processing applications, representing an incremental improvement over existing methods.

The paper tackled speech declipping by using a U-Net convolutional neural network to translate magnitude spectrum images from clipped to clean signals, achieving superior performance in quality and intelligibility measures, especially in severe clipping and noisy conditions.

Clipping, as a current nonlinear distortion, often occurs due to the limited dynamic range of audio recorders. It degrades the speech quality and intelligibility and adversely affects the performances of speech and speaker recognitions. In this paper, we focus on enhancement of clipped speech by using a fully convolutional neural network as U-Net. Motivated by the idea of image-to-image translation, we propose a declipping approach, namely U-Net declipper in which the magnitude spectrum images of clipped signals are translated to the corresponding images of clean ones. The experimental results show that the proposed approach outperforms other declipping methods in terms of both quality and intelligibility measures, especially in severe clipping cases. Moreover, the superior performance of the U-Net declipper over the well-known declipping methods is verified in additive Gaussian noise conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes